The main problem with these LLMs is that they hallucinate information. For example, I asked about famous people from Chicago and it told me Barack Obama was born in Chicago. Of course he was born in Hawaii, but if you relaxed your eyes until the words on his Wikipedia page all blended together, you might accidentally think he was born in Chicago too.
The problem is, GPT-3 doesn’t have a knowledge graph of the world. It’s all just statistical guesses.
It said he was born in Chicago, so it went out of its way to make that mistake.
Then I said “Barack Obama was not born in Chicago” and it was able to synthesize the correct answer.
Then I wondered whether it was just being suggestible, so I said “Barack Obama was not born in Kenya” and it wrote an answer about the birther hoax. Good.
Then I asked it “what falsehoods have you told me” giving it a chance at redemption and it brought up the Obama birthplace error but also weirdly that Chicago isn’t the 3rd largest city in the US but Illinois which is also incorrect.
That’s not a very good test of hallucination because there’s likely to be a sentence on the internet (and thus its training set) which answers that question directly. If you want to test hallucination you need to ask it a novel question or something contingent on the context of the question.
In my case GPT3 simply voluntarily failed the hallucination test but that’s kind of like crashing its car in the DMV parking lot. I didn’t need to go further.
The problem is, GPT-3 doesn’t have a knowledge graph of the world. It’s all just statistical guesses.