>In other words the limitation you talk about is not inherent in the technology, but in their choices.
I think it's somewhat inherent in the technology. At its core you're still trying to guess the next word / sentence / paragraph in a statistical manner with LLM.
Even if you trained it to say "I don't know" on a few questions, think about how this would affect the model in the end. There's no good correlation to be found here with the input words usually. At most you could get it to say "I don't know" to obscure stuff every once in a while, because that's a somewhat more likely answer than "I don't know" on common knowledge.
Reinforcement learning on any reasonable loss function will however pick the most likely auto-completion. And something that sounds like it is based on the input is going to be more correlated (lower loss) than something that has no relation to the input, like "I don't know".
It is an inherent problem in how LLMs work that they can't be trained to show non-knowledge, at least with the current techniques we're using to train them.
This is also why it's hard to tell DALL E-3 what shouldn't be in the picture. Like the famous "no cheese" on the hamburger problem. Hamburgers and cheesburgers are somewhat correlated. The first image spit out for hamburger was a cheesburger. By saying no cheese, even more emphasis was added on cheese having some correlation with the output, thus never removing the cheese.
Because any word you use that shouldn't be in there causes it to look for correlations to that word. It's again, an inherent problem in the technology
I think it's somewhat inherent in the technology. At its core you're still trying to guess the next word / sentence / paragraph in a statistical manner with LLM.
Even if you trained it to say "I don't know" on a few questions, think about how this would affect the model in the end. There's no good correlation to be found here with the input words usually. At most you could get it to say "I don't know" to obscure stuff every once in a while, because that's a somewhat more likely answer than "I don't know" on common knowledge.
Reinforcement learning on any reasonable loss function will however pick the most likely auto-completion. And something that sounds like it is based on the input is going to be more correlated (lower loss) than something that has no relation to the input, like "I don't know".
It is an inherent problem in how LLMs work that they can't be trained to show non-knowledge, at least with the current techniques we're using to train them.
This is also why it's hard to tell DALL E-3 what shouldn't be in the picture. Like the famous "no cheese" on the hamburger problem. Hamburgers and cheesburgers are somewhat correlated. The first image spit out for hamburger was a cheesburger. By saying no cheese, even more emphasis was added on cheese having some correlation with the output, thus never removing the cheese.
Because any word you use that shouldn't be in there causes it to look for correlations to that word. It's again, an inherent problem in the technology