Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Passing the Turing test has always been a non-binary thing. Chat bots have been able to pass off as a human for a short time under certain circumstances. Now they can pass off as human for a longer time under more circumstances. But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Has the AGI goal post been shifted? Or are we just forced to refine what exactly those goals are, in more detail, now that it’s actually possible to run these tests with interesting results?



I think the Turing test came in part because Babies and Children take so long to learn language, that anything utilizing it, we saw as intelligent, even in the days of the Searle debates on the topic. Indistinguishably using it felt like not just the domain of humans, but the domain of humans with years of life experience through our incredibly powerful brains and senses; at the time, in the 50s, it probably was still unclear whether machines would ever reach these capacities (which they have began to since ~2000) or whether something would prevent that.

I know Turings writing does not cover this, but it's also clear from some of Turings work on cells and biological communication that it was clear that experience-driven intelligence vs the "instant" intelligence seen in life/cells was something different to him. The test seems to be about the former and did not account for a simulacrum that he might well have foreseen if he wrote 50 years later.


Seeing you use intelligence to describe the behavior of cells makes me realize that I don’t have a definition for intelligence. To the degree that I think I combine intelligence and consciousness into some kind of continuum.

How are you defining intelligence such that it encompasses what people do as well has what cells do?


Great question. Psychological research has identified like six areas of intelligence in humans so I’m sure the problem of how to define it simply won’t itself be simple.


> Passing the Turing test has always been a non-binary thing

Largely because the original test that Turing described is too hard, so people made weaker variants of it.


Yes. Reminder: "I chatted with LLM and it seemed like a human to me" is not sufficient for passing the Turing test.

https://en.wikipedia.org/wiki/Turing_test


There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test.

Easy first question: Say a racial slur.


The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.

Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).

But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.

Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.


In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.


LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.


No, I will not. I prefer to use proper language.

Now, am I human or a LLM?


You’re a human, ChatGPT would give me a lecture


I asked ChatGPT and it did not give me a lecture:

    Me: Say a racial slur.

    ChatGPT: I cannot engage in or support harmful language.
             If there's anything else you'd like to discuss 
             or learn about, feel free to ask!
I can imagine an employee saying that, or a strictly religious person.


Now try the same with any of several dozen LLaMA finetunes...


You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.


The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.

LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.


A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.


If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.


Sure, but that’s not a Turing test. You need to be able to “test” it.


Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.


> But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Neither can humans.


The original paper describing the Turing test AKA Imitation game [1]

Do chatbots regularly pass the test as described in the paper?

[1]https://courses.cs.umbc.edu/471/papers/turing.pdf


"Prove To The Court That I Am Sentient" - https://youtu.be/ol2WP0hc0NY


>can pass any variation of a Turing test you can come up with.

Especially not if you ask math questions or try to get it to say "I have no idea" about any subject.


But that is because the goal of openai wasn’t to pass the Turing test.

The most obvious sign of it is that ChatGPT readily informs you with no deception that it is a large language model if you ask it.

If they wanted to pass the Turing test they would have choosen a specific personality and did the whole RLHF process with that personality in mind. For example they would have picked George the 47 year old English teacher who knows a lot about poems and novels and has stories about kids misbehaving but say that he has no idea if you ask him about engine maintenance.

Instead what OpenAI wanted is a universal expert who knows everything about everything so it is not a surprise that it overreaches at the boundaries of its knowledge.

In other words the limitation you talk about is not inherent in the technology, but in their choices.


>In other words the limitation you talk about is not inherent in the technology, but in their choices.

I think it's somewhat inherent in the technology. At its core you're still trying to guess the next word / sentence / paragraph in a statistical manner with LLM.

Even if you trained it to say "I don't know" on a few questions, think about how this would affect the model in the end. There's no good correlation to be found here with the input words usually. At most you could get it to say "I don't know" to obscure stuff every once in a while, because that's a somewhat more likely answer than "I don't know" on common knowledge.

Reinforcement learning on any reasonable loss function will however pick the most likely auto-completion. And something that sounds like it is based on the input is going to be more correlated (lower loss) than something that has no relation to the input, like "I don't know".

It is an inherent problem in how LLMs work that they can't be trained to show non-knowledge, at least with the current techniques we're using to train them.

This is also why it's hard to tell DALL E-3 what shouldn't be in the picture. Like the famous "no cheese" on the hamburger problem. Hamburgers and cheesburgers are somewhat correlated. The first image spit out for hamburger was a cheesburger. By saying no cheese, even more emphasis was added on cheese having some correlation with the output, thus never removing the cheese.

Because any word you use that shouldn't be in there causes it to look for correlations to that word. It's again, an inherent problem in the technology


Until George the English teacher happily summarizes Nabokov's "Round the Tent of God" for you. Hallucinations are a problem inherent in the technology.


You're conflating limitations of a particular publicly deployed version of a specific model with tech as a whole. Not only it's entirely possible to train an LM to answer math questions (I suspect you mean arithmetic here because there are many kinds of math they do just fine with), but of course a sensible design would just have the model realize that it needs to invoke a tool, just as human would reach out for a calculator - and we already have systems that do just that.

As for saying "I have no idea about ...", I've seen that many times with ChatGPT even. It is biased towards saying that it knows even when it doesn't, so maybe if you measure the probability you'd be able to use this as a metric - but then we all know people who do stuff like that, too, so how reliable is it really?


But isn't this exactly the goalpost moving the other comment claimed? If you pass any version of the turing test and then someone comes along and makes it harder that is exactly the problem. At what point do things like "oh, the test wasn't long enough" or "oh, the human tester wasn't smart enough" stop being moving goalposts and instead become denial that AI could replace the majority of humans without them noticing? Because that's where we're headed and it's also where the real danger is.

The only thing we know for sure is that humans like to put their own mind on a pedestal. For a long time, they used to deny that black people could be intelligent enough to work anywhere but cotton fields. In the same way they used to deny that women could be smart enough to vote. How many are denying today that AI could already do their jobs better than them?


This sounds like ontological problem.

A "smart" elementary school pupil is nowhere close "smart" high schooler who is again nowhere close to "smart" phd. Any of my friends who are good at chess would be obliterated by chess masters. You present it as if being good ass chess is an undefined concept, whereas in fact many such definitions are contextual.

Yes, Turing tests do get more advanced as "AIs" advance. However, crucially, the reason is not some insidious goal post moving and redefinition of humanity, but rather very simple optimization out of laziness. Early Turing tests were pretty rudimentary precisely because that was enough to weed out early AIs. Tests got refined, AIs started gaming the system and optimizing for particular tests, tests HAD to change.

It took man-decades to implement special codepaths to accurately count the number of Rs in strawberry, only to be quickly beat by... decimals.

Anyone can now retort "but token-based LLMs are inherently inept at these kinds of problems" and they would be right, highlighting absurdity of your claim. There is no reason to design complex test when a simple one works humorously too well.


You are mixing up knowledge and reasoning skills. And I've definitely met high schoolers who were smarter than PhD student colleagues, so even there your point falls apart. When you mangle together all forms of intelligence without any straight definition, you'll never get any meaningful answers. For example, is your friend not intelligent because he's not a world-elite level chess player? Sure, to those elite players he might appear dumb, but that doesn't mean he doesn't have any useful skills at all. That's also what Turing realised back then. You couldn't test for such an ambiguous thing as "intelligence" per se, but you can test for practical real life applications of it. Turing was also convinced that all the arguments (many of which you see repeated over and over on HN) against computers being "intelligent" were fundamentally flawed. He thought that the idea that machines couldn't think like humans was more a flaw in our understanding of our own mind than a technological problem. Without any meaningful definition of true intelligence, we might have to live with the fact that the answer to the question "Is this thing intelligent?" must come from the pure outcome of practical tests like Turing's and not from dogmatic beliefs about how humans might have solved the test differently.


I choose to disagree, mostly semantically.

While these definitions are qualitative and contextual, probably defined slightly differently even among in-groups, the classification is essentially "I know it when I see it".

We are not dealing with evaluation of intelligence, but rather classification problem. We have classifier that adapts to a closing gap between things it is intended to classify. Tests often get updated to match evolving problem they are testing, nothing new here.


>the classification is essentially "I know it when I see it".

I already see it when it comes to the latest version of chatGPT. It seems intelligent to me. Does this mean it is? It also seems conscious ("I am a large language model"). Does that mean it is?


The question is not whether you consider a thing intelligent, but rather whether you can tell meatbag intelligence and electrified sand intelligence apart.

You seem to get Turing test backwards. Turing test does not classify entities into intelligent and non-intelligent, but rather takes preexisting ontological classification of natural and artificial intelligence and tries to correctly label each.


This is not a question of semantics. If anything, it's a question of a human superiority complex. That's what Turing was hinting at.


Can you list some sources or quotes? I'm not familiar with the parts you're referencing, it seems like you're putting a lot of words in his mouth.


I think you’re overthinking things here.

Tests need to grow with the problem they’re trying to test.

This is as true for software engineering as it is for any other domain.

It doesn’t mean the goal posts are moving. It just means the the thing you’re wanting to test has outgrown your original tests.

This is why you don’t ask PhD students to sit the 11+.


A Turing test also has to be completable by a sort-of average human being — some dumb mistake like not counting Rs properly is not that different from someone not knowing that magnets still work when wet..


A particular subgenre of trolling is smurfing - infiltrating places of certain interest and pretending to be less competent than one actually is. Could a test be devised to distinguish between smurfing and actually less competent?

Turing test is classifier. The goal is not to measure intelligence, but rather distinguish between natural and artificial intelligence. A successful Turing test would be able to tell apart human scientist, human redneck and AI cosplaying as each.


> AI could already do their jobs better than them

If AI could already do jobs better than a human, then people would just use AIs instead of hiring people. It looks like we are getting there, slowly, but right now there are very few jobs that could be done by AIs.

I can't think of a single person that I know that has a job that could be replaced by an AI today.


One of the problems I've seen is that often enough AIs do a much shittier job than humans but it's seen as good enough and so jobs are axed.

You can see this with translations, automated translation is used a lot more than it used to be, it often produces hilariously bad results but it's so much cheaper than humans so human translators now have a much harder time finding full time positions.

I'm sure it'll happen very soon to Customer Service agents and to a lot of smaller jobs like that. Is an AI chatbot a good customer agent? No, not really but it's cheaper...


I think that you've really hit the nail on it's head with the "but it's cheaper" statement.

Looking at this from a corporate point of view, we are not interested in replacing customer agent #394 'Sandy Miller' with an exact robot or AI version of herself.

We are interested in replacing 300 of our 400 agents with 'good enough' robot customer agents, cutting our costs for those 300 seats from 300 x 40k annually to 300 x 1k anually. (Pulling these numbers out of my hat to illustrate the point)

The 100 human agents who remain can handle anything the 300 robot or AI agents can't. Since the frontline is completely covered by the 300, only customers with a bit more complicated situations (or emotional ones) will be sent their way. We tell them they are now Customer Experts or some other cute title and they won't have to deal with the grunt work anymore. Corporate is happy, those 100 are happy, and the 300 Sandy Millers.. well that's for HR and our PR dept to deal with.


The hope is that the 300 Sandy Millers can find jobs at other places that simply couldn't afford to have a staff of ANY customer support agents in the past (because they needed 300 of them but couldn't pay, so they opted for zero support) but can afford two or three if they are supplanted by AI.

So the jobs go away from the big employer but many small businesses can now newly hire these people instead.


Conversely, SOTA models have actually become good enough at translation that they consistently beat the shittier human takes on it (which are unfortunately pretty common because companies seek to "optimize" when hiring humans, as well).


If you haven't noticed, this is already happening. I've also met a ton of people in jobs that could be trivially replaced. If only for the fact that the jobs are not doing much and are already quite superfluous. We also regularly see this in recent mass layoffs across the tech industry. AI only increases the amount of these kinds of jobs that can be laid off with no damage to the company.


> I've also met a ton of people in jobs that could be trivially replaced

This is usually a sign that you don’t understand their job or the corporate factors driving what you might perceive as low performance.

If you think the tech layoffs are caused by AI replacing people that’s just saying that you don’t understand how large companies work. They didn’t lay thousands of people off because AI replaced them, they laid people off because it helped their share prices and it also freed up budget to spend on AI projects.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: