Speaking with GPT-4, it is hard to deny the conjecture that its weights encode a...

DanHulton · on July 28, 2023

It is, like you said, conjecture. The best we can say is that it _usually_ provides responses that are _consistent_ with responses coming from an intelligence with an internal world model. That doesn't mean that's the only way to get those responses, nor does it mean that this is necessarily what's happening in this case.

So saying things like "the model has come to the conclusion that" or "smarter than", or "learns to be deceptive", I think that's premature at best. I'm not yet convinced that there's sufficient evidence to show appreciable internal state and logical processes. There's so, so many examples where what looks like legit understanding breaks down with the slightest tweak to the prompt, and it goes from looking like a savant to someone high on just a tremendous amount of LSD.

If there was an internal world model that just wasn't correct, I would expect to see its incorrect answers be at least logically consistent, but instead it looks way, way more like the trick just doesn't work for this case.

So to get back to the original point, this is MS trying to leverage this trick to do a task that requires actual logical reasoning, factual evaluation, and internal world state, and we're just not there. (I hesitate to use the word "yet", because there's still a lot of not-yet-conclusive discussion around whether current LLM techniques will ever get us "there." Colour me tentatively pessimistic in the meantime. =) )