> If we define hallucinations as falsehoods introduced between the training data and LLM output,
Yes, if.
Or we could realize that the LLMs output is a random draw from a distribution learned from the training data, i.e. ALL of its outputs are a hallucination. It has no concept of truth or falsehoods.
I think what you are saying here is that because it has no "concept" (I'll assume that means internal model) of truth, then there is no possible way of improving the truthiness of an LLMs outputs.
However, we do know that LLMs posses viable internal models, as I linked to in the post you are responding to. The OP paper notes that the probes it uses find the strongest signal of truth, where truth is defined by whatever the correct answer on each benchmark is, on the middle layers of the model during the activation of these "exact answer" tokens. That is, we have something which statistically correlates with whether the LLM's output matches "benchmark truth" inside the LLM. Assuming that you are willing to grant that "concept" and "internal model" are pretty much the same, this sure sounds like a concept of "benchmark truth" at work. If you aren't willing to grant that, I have no idea of what you mean by concept.
If you mean to say that humans have some model of Objective Truth which is inherently superior, I'd argue that isn't really the case. Human philosophers have been arguing for centuries over how to define truth, and don't seem to have come to any conclusion on the matter. In practice, people have wildly diverging definitions of truth, which depend on things like how religious or skeptical they are, what the standards for truth are in their culture, and various specific quirks from their own personality and life experience.
This paper only measured "benchmark truth" because that is easy to measure, but it seems reasonable to assume that other models of truth exist within them. Given that LLMs are supposed to replicate the words that humans wrote, I suspect that their internal models of truth work out to be some agglomeration (plus some noise) of what various humans think of as truth.
If that were the case, you couldn't give it a statement and ask whether that statement is true or not, and get back a response that is correct more often than not.
If language communicates thoughts, thoughts have a relationship with reality, and that relationship might be true or false or something else.
Then what thought is LLM language communicating, to what reality does it bear a relationship, and what is the truth or falseness of that language?
To me, LLM generated sentences have no truth or false value, they are strings, literally, not thoughts.
Take the simple "user:how much is two plus two? assistant: two plus two is four". It may seem trivial, but how do ascertain that that statement maps to 2+2=4? Do you make a leap of faith or argue that the word plus maps to the adding function? What about is, does it map to equality? Even if they are the same tokens as water is wet (where wet is not water?). Or are we arguing that the truthfulness lies on the embedding interpretation? Where now tokens and strings merely communicate the multidim embedding space, which could be said to be a thought, now we are mapping some of the vectors in that space as true, and some as false?
Lets assume LLMs don't "think". We feed an LLM an input and get back an output string. It is then possible to interpret that string as having meaning in the same way we interpret human writing as having meaning, even though we may choose not to. At that point, we have created a thought in our heads which could be true or false.
Now lets talk about calculators. We can think of calculators as similar to LLMs, but speaking a more restricted language and giving significantly more reliable results. The calculator takes a thought converted to a string as input from the user, and outputs a string, which the user then converts to a thought. The user values that string creating a thought which has a higher truthiness. People don't like buggy calculators.
I'd say one can view an LLM in exactly the same way, just that they can take a much richer language of thoughts, but output significantly buggier results.
Yes, if.
Or we could realize that the LLMs output is a random draw from a distribution learned from the training data, i.e. ALL of its outputs are a hallucination. It has no concept of truth or falsehoods.