I think what you are saying here is that because it has no "concept" (I'll assume that means internal model) of truth, then there is no possible way of improving the truthiness of an LLMs outputs.
However, we do know that LLMs posses viable internal models, as I linked to in the post you are responding to. The OP paper notes that the probes it uses find the strongest signal of truth, where truth is defined by whatever the correct answer on each benchmark is, on the middle layers of the model during the activation of these "exact answer" tokens. That is, we have something which statistically correlates with whether the LLM's output matches "benchmark truth" inside the LLM. Assuming that you are willing to grant that "concept" and "internal model" are pretty much the same, this sure sounds like a concept of "benchmark truth" at work. If you aren't willing to grant that, I have no idea of what you mean by concept.
If you mean to say that humans have some model of Objective Truth which is inherently superior, I'd argue that isn't really the case. Human philosophers have been arguing for centuries over how to define truth, and don't seem to have come to any conclusion on the matter. In practice, people have wildly diverging definitions of truth, which depend on things like how religious or skeptical they are, what the standards for truth are in their culture, and various specific quirks from their own personality and life experience.
This paper only measured "benchmark truth" because that is easy to measure, but it seems reasonable to assume that other models of truth exist within them. Given that LLMs are supposed to replicate the words that humans wrote, I suspect that their internal models of truth work out to be some agglomeration (plus some noise) of what various humans think of as truth.
However, we do know that LLMs posses viable internal models, as I linked to in the post you are responding to. The OP paper notes that the probes it uses find the strongest signal of truth, where truth is defined by whatever the correct answer on each benchmark is, on the middle layers of the model during the activation of these "exact answer" tokens. That is, we have something which statistically correlates with whether the LLM's output matches "benchmark truth" inside the LLM. Assuming that you are willing to grant that "concept" and "internal model" are pretty much the same, this sure sounds like a concept of "benchmark truth" at work. If you aren't willing to grant that, I have no idea of what you mean by concept.
If you mean to say that humans have some model of Objective Truth which is inherently superior, I'd argue that isn't really the case. Human philosophers have been arguing for centuries over how to define truth, and don't seem to have come to any conclusion on the matter. In practice, people have wildly diverging definitions of truth, which depend on things like how religious or skeptical they are, what the standards for truth are in their culture, and various specific quirks from their own personality and life experience.
This paper only measured "benchmark truth" because that is easy to measure, but it seems reasonable to assume that other models of truth exist within them. Given that LLMs are supposed to replicate the words that humans wrote, I suspect that their internal models of truth work out to be some agglomeration (plus some noise) of what various humans think of as truth.