Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Microsoft's AI shopping announcement contains hallucinations in the demo (perfectrec.com)
90 points by craigts on July 28, 2023 | hide | past | favorite | 106 comments


Is it just me or does everyone trust AI opinions less and less ? Every time I ask it to find top 5 of something, I go and double check myself and almost always find it to be wrong. For example try searching for top 5 restaurants around me in bard. Some of them dont even exist lol and some are just random if you cross verify with actual popularity from yelp etc.


Using language models for location or time based things is not recommended, as this usually requires non-textual data. Better to use them for general knowledge questions, programming help, translation, or writing. Asking them to do any complex calculations (especially when they also require non-text raw data, like inflation in a given time period) is also futile.


> general knowledge questions, programming help, translation, or writing.

They get all of these wrong too. It's like some AI-specific variant of the Gell-man amnesia effect. It's usually right in the first sentence, but if you really know the answer, it's often either very debatable or completely wrong by the halfway mark of the paragraph. Meanwhile, the associated brand authority is problematic.


They don't get them all wrong, and even when they are not 100% correct they're usually better than nothing.

For instance, I needed to write code to spawn a child process and communicate with it via stdin/stdout in C++. This is pretty easy in most modern languages but in C++ you have to call POSIX's dump process spawning dance pretty much with raw syscalls. fork, execve, etc.

Rather than googling all the syscalls I would need and how to arrange them I just asked ChatGPT to do it. I've done it before so it was much easier to verify than to start from scratch.

And it got it 90% right. The only bit it got wrong was to make a single pipe and connect it to both stdin and stdout, rather than one pipe for each. But that was easy to spot and fix.

AI - at least for programming - is an enormous time saver. Could easily increase productivity by 50% in some cases.

In 5 years I expect it to be as normal as using an IDE. There are still people that slow themselves down by using unintelligent editors, and they will probably continue to live in the 80s, but people that use tools to help them will expect to use Copilot or similar all the time.


> In 5 years I expect it to be as normal as using an IDE.

Five years seems too conservative. Five years ago we only had GPT-1, which only generated funny word salad with acceptable syntax. An AI like ChatGPT seemed unthinkable at the time. And ChatGPT came out only last year. In five years similarly radical changes could happen. Programmers might actually get replaced with AI. Sounds too radical? But ChatGPT also would have sounded too radical five years ago!


It's silly to extrapolate breakthroughs.


AI breakthroughs happen at an increasing rate at least since AlexNet came out in 2012. Before that, "AI" was mostly OCR. The speed of progress is crazy. It doesn't look like a slowdown is ahead of us.


Gell-man is exactly what I’ve been referencing in conversation recently. Any professional will gladly explain why their field is really much too nuanced and complex for LLMs to threaten in the near term before seamlessly explaining how close we are to all those other engineers/doctors/clerks being automated right away.


In my opinion, roles directly threatened by LLMs are coodinators, client managers, etc. People whose job depends entirely on "soft skills" to interact and giving vague summaries, status updates, and assigning tasks.

A chat bot that scans Jira, accepts phone calls, and runs scrums can't possibly be any less reliable than some of the people I've worked with.


GPT-4 outperforms average students in exams of several fields. There are a lot of benchmarks, and GPT-4 does mostly do very well as long as the field relies enough on declarative knowledge.


I get different answers every time to "what is the third element in the periodic table" from llama2.

I'll hold off actually using them for now.


That model is not state of the art. Even GPT-3.5 can answer this question.


On GPT 3.5

Q: "What is the seventy fourth element of the periodic table?"

A: "The seventy-fourth element of the periodic table is Rhenium..."

But this is really shooting fish in a barrel. Given the way LLMs work why would you expect them to provide factually correct text completion?


Llama2, when fine-tuned, can be better than GPT3.5

Source: a trusted coworker


It's just reality sinking in.


Well it doesn’t surprise me since I have been saying this for a while that these LLMs hallucinate nonsense to the point where you end up triple checking whatever it outputs.

LLMs thrive in applications that involve creativity and non-serious applications mostly around fantasy or creative writing. Anyone using them seriously outside of summarization for high risk use cases is going to be very disappointed.


Perhaps the outcome is we get better at actually checking things, not a terrible result.


I recommend LLM users leverage the RAG technique


I'm glad that expectations are shifting. At the extremes, it's either a fancy parlor trick or a hyper-intelligent god. A lot of the original hype has skewed much closer to the hyper-intelligent god side of the spectrum. It's definitely not a fancy parlor trick, but it's likely closer to that than the other side it's being hyped as.


I think the most amusing comment I've read here in the last few weeks called it "demented Clippy".


My trust factor for online opinion is ranked:

1) Online forums (adding 'reddit' or 'hacker news' to a search query) 2) GPT4 3) Google search


There is information, here, in the observations that all these "AI" demos contain blatant inaccuracies, with apparently no fact-checking having taken place. It's clear that these companies (Microsoft, Google, OpenAI) do not care about accuracy, correctness, or the truth. It is not part of their business model.

There is no respect for your time, your safety, your reputation. Your role as a customer is to be conned into using the products for long enough that a return on investment can be made; the companies will pivot to a new product as soon as the untrustworthiness of the old one becomes common knowledge.

Short-term thinking. Desperation.


A hallucination is an unexpected emergence.

The 'making up' facts, because it cannot determine a fact from fiction, is entirely expected behavior.

There is no 'hallucination' as the behavior is anticipated, expected, and entirely within normal operations processes.

The bullshit comes from there being no model of trust these AIs subscribe to. I'd love-love-love to see these AI producers be held to some responsibility to verification of truth and ethics.

These companies/universities/groups allowing their applications to bold-face-lie (misrepresent data with authority) to citizens should be top-priority to bash-in-the-face by legislators around the world.


> There is no 'hallucination' as the behavior is anticipated, expected, and entirely within normal operations processes.

Exactly. These are models that predict text sequences. These sequences often semantically express falsehoods, but the model's not "lying", it's not "hallucinating", and it's definitely not malfunctioning. It's doing exactly what it was designed to do.

There definitely are "lies" and "hallucinations" here though ... but they're coming from the hype-cycle-hucksters trying to convince us that this whole process somehow resembles "intelligence".


It clearly has some level of intelligence, though it’s pretty far from human level. The hallucinations don’t make it less intelligent because it’s not “trying” to avoid them, as you seem to know already


> It clearly has some level of intelligence

Absolutely not, this is not remotely "clear", and it's a very strange thing to assert.

> The hallucinations don’t make it less intelligent because it’s not “trying” to avoid them, as you seem to know already

What? No. What does "as you seem to know already" mean in this context?


> Absolutely not, this is not remotely "clear", and it's a very strange thing to assert.

I guess it depends how you define intelligence but I guess I would say intelligence is the ability to find the best action to take to achieve a certain goal, and AI can do that reasonably well

> What does "as you seem to know already" mean in this context?

It means that based on the comment I was replying to the person seems to already understand what I just said


> It clearly has some level of intelligence

https://plato.stanford.edu/entries/chinese-room/#LargPhilIss...


To me the Chinese Room thought experiment seems like it's meant to show that AIs can be intelligent, not the opposite?

"Searle could receive Chinese characters through a slot in the door, process them according to the program's instructions, and produce Chinese characters as output, without understanding any of the content of the Chinese writing."

Sure, but that doesn't mean the state of the program doesn't contain any understanding or intelligence, it's just that the human doesn't have a high-level view that can be used to decode that internal state. We're not asking whether the computer chip itself understands things but whether the something contained in the program running on it does. The human could also run a physics simulation as in https://xkcd.com/505/ and recreate a human brain which would be no different to a physical brain in terms of behavior and so there would be no reason not to call it intelligent


You're misunderstanding the thought experiment then. By definition the person inside the Chinese Room doesn't understand Chinese.

> but that doesn't mean the state of the program doesn't contain any understanding or intelligence

Programs don't contain understanding or intelligence, they contain instructions.

> We're not asking whether the computer chip itself understands things but whether the something contained in the program running on it does.

I feel like your saying "I'm not accusing the blender of being intelligent, I'm saying the recipe for this margarita is self aware." It doesn't matter if its hardware or software, neither is capable of understanding because understanding is a conscious experience and neither a blender nor a recipe are sentient.

> The human could also run a physics simulation

Cool XKCD but I'm not arguing about wether AI is possible. Just pointing out that convolutional neural networks are not self aware or intelligent or actually learning (at least not yet).


> “You're misunderstanding the thought experiment then.”

So if I don’t agree with it, I’m misunderstanding it? It even says in the Wikipedia article for it:

> "The overwhelming majority", notes BBS editor Stevan Harnad, "still think that the Chinese Room Argument is dead wrong".

So don’t try to pretend it’s some absolute truth, it’s just a flawed argument

> Programs don't contain understanding or intelligence, they contain instructions.

Why can intelligence and understanding not come from a sufficiently complex set of instructions?

> understanding is a conscious experience and neither a blender nor a recipe are sentient.

That’s an odd definition of understanding. By my definition understanding is having information about something and the ability to process it such that you can effectively predict its behaviour and possibly take actions to change its state to fit a goal. I guess you will always win if you redefine all the words to mean what you want. Your definition is useless because it’s unfalsifiable because you can’t measure whether something is “sentient”

> Just pointing out that convolutional neural networks are not self aware or intelligent or actually learning

Self aware? Probably no

Intelligent? To some extent, yes

Learning? Of course they are, I don’t see how you can argue that they aren’t


> So if I don’t agree with it, I’m misunderstanding it?

Now you're misunderstanding me. I'm not saying you're not allowed to have a different opinion on the full thought experiment. You're assuming intelligence in the setup of the thought experiment and that is objectively not how it is meant to be interpreted.

> Why can intelligence and understanding not come from a sufficiently complex set of instructions?

Again I didn't say it can't just that convolutional neural networks as they currently exist are not that complex. It's a fancy Markov chain.

> I guess you will always win if you redefine all the words to mean what you want.

You say directly after making up your own definition of intelligence. I'm not interested in discussing your definition of intelligence or the definition of intelligence, I'm talking about this specific application of technology and if it meets a common definition of intelligence. Please point to a dictionary definition if you wanna continue this back and forth

> Learning? Of course they are, I don’t see how you can argue that they aren’t

Because learning has a definition. Theres a reason AI researchers call it "Training" and not "Teaching"


> You say directly after making up your own definition of intelligence.

I guess but I think the one I’m using is more common and useful. The Google dictionary says “the ability to acquire and apply knowledge and skills” which is closer to mine (having knowledge and the ability to apply it) than yours (some abstract idea of consciousness that can’t be measured)

> Theres a reason AI researchers call it "Training" and not "Teaching"

They also call it machine learning


> the ability to acquire and apply knowledge and skills

Ok but what is knowledge? You need to follow that rabbit hole. Knowledge isn't just data. You'll find that knowledge is frequently defined with some tie in to experience and the definition of experience is tied to consciousness.

> They also call it machine learning

They have called the field Artificial Intelligence (or ML) since 1956 but that doesn't mean they had an example of an instance of artificial intelligence. It's just the name of the field. I've never heard of a researcher referring to the act of training as "machine learning" though, just the field.


Speaking with GPT-4, it is hard to deny the conjecture that its weights encode an internal world model somewhere.

If so, the difficulty is not that the model has no conception of truth and falsity, it is rather to motivate the model to tell the truth. Or more precisely, to let the model be honest, to only tell things it believes to be true, things which are part of its world model.

Unfortunately, we can't just tell the model to be honest, since we can't distinguish between responses the model does or does not believe to be true. With RLHF fine-tuning, we can train the model to tend to give answers the human raters believe to be true. But we want the model to tell what it believes to be true, not what it believes that we believe is true!

For example, human raters may overwhelmingly rate response X as false, but the model, having read the entire Internet, may have come to the conclusion that X is true. So RLHF would train it to lie about X, to answer not-X instead of X.

This problem could turn out to be fatal when a model becomes significantly smarter than humans, because this means it would less often believe according to human biases and misconceptions, so it would learn to be deceptive and to tell us only what we want to believe. This could have frightening consequences if this leads it to conceal any of its possible misalignments with human values from us.


It is, like you said, conjecture. The best we can say is that it _usually_ provides responses that are _consistent_ with responses coming from an intelligence with an internal world model. That doesn't mean that's the only way to get those responses, nor does it mean that this is necessarily what's happening in this case.

So saying things like "the model has come to the conclusion that" or "smarter than", or "learns to be deceptive", I think that's premature at best. I'm not yet convinced that there's sufficient evidence to show appreciable internal state and logical processes. There's so, so many examples where what looks like legit understanding breaks down with the slightest tweak to the prompt, and it goes from looking like a savant to someone high on just a tremendous amount of LSD.

If there was an internal world model that just wasn't correct, I would expect to see its incorrect answers be at least logically consistent, but instead it looks way, way more like the trick just doesn't work for this case.

So to get back to the original point, this is MS trying to leverage this trick to do a task that requires actual logical reasoning, factual evaluation, and internal world state, and we're just not there. (I hesitate to use the word "yet", because there's still a lot of not-yet-conclusive discussion around whether current LLM techniques will ever get us "there." Colour me tentatively pessimistic in the meantime. =) )


because it cannot determine a fact from fiction

This is way too narrow. Even if it were able to determine fact from fiction, a neural network would still be able to hallucinate as long as it has no ontology: if it doesn't "know" the boundary between objects it has no way of knowing the atomicity of its facts, so it will inevitably combine even known "facts" into falsehoods.

To illustrate, the following fact-based syllogism would sound perfectly valid in the absence of a working ontology:

  A: That green flask costs $10
  B: This flask is green
  => This flask costs $10


"Lies are attempts to hide the truth by willfully denying facts. Fiction, on the other hand, is an attempt to reveal the truth by ignoring facts. — John Green


Bing works for Microsoft and basically that's an ad. Wouldn't any human paid by Microsoft say in an ad that Surface Headphones 2 are the best ANC headphones?


Pretty soon some LLM owner is going to use the argument "Everyone is allowed to have their own opinions, and LLMs are too, their responses don't have to line up with someone else's preferences."


Alternative Intelligence


Opinion pieces like shopping recommendations are quite hard for current LLMs. Either it is a hard fact - or pure creative work - that's where AI shines. Anything between and things get tricky


This is one of those areas where the poor quality of the data influences the output, I think.

There are so many garbage, lazily written product reviews, by websites that only exist to get people to click affiliate links. These sites only have one goal, which is to get you to click an affiliate link and make a purchase. So it is not in their best interest to say "You shouldn't buy this."

Rather, they make a list of "top X Foobars", they start with a really expensive one, then they follow with a more reasonably-priced one, and give it a very positive review. It leads to clicks and purchases.

Given this, it's not surprising to me that even the best LLMs carry pieces of this with them. Ask it to predict text describing some tech product on a sales page, and of course parts of that low-quality data will bleed through.


There is an argument to be made for automatically downweighting (be it training epochs or pagerank rating) anything with affiliate links. But I guess it would be trivial to hide them behind a redirect.

That being said, I recently asked the Bing chatbot about the difference between two similar sounding printer models, and it gave a good explanation which I previously couldn't quickly find via Google. In case of Bing it is sometimes not completely clear to which degree its answer depends on the Web search, if it performed one, and to which degree it is just answering from its background knowledge (which could be prone to hallucination, but is less "gullible", so to speak). It provides sources, but not everything it says is necessarily present in the source. I'm actually surprised how quickly Bing is able to search (load and read) multiple websites, given that the loading times are not always trivial. It turns out they are much faster at reading than at typing. Indeed, each forward pass reads the entire context window, so once for every generated token!


For sure. The garbage in, garbage out problem is quite real for ecommerce applications.


Is anyone shipping AI products that DO NOT contain hallucinations? I thought that was pretty much a given.


Well there isn't a human that never "hallucinates" in meaning we use for LLMs aka gives "incorrect answers" confidently.

Human's brains use lots of heuristics - we don't "think step by step" through everything - instead we rapidly construct an answer for almost everything.

What we say is "hallucinations" for AI in humans is "misspeaking, misremembering anything, off by 1 math/counting, missidentifying someone, using the wrong variable/method when programming, etc."


Hallucinating is roughly how they work, we just label it as such when it's something obviously weird


This is something I'm not sure people understand.

LLM's only make a "best guess" for each next token. That's it. When it's wrong we call it a "hallucination" but really the entire thing was a "hallucination" to begin with.

This is also analogous to humans - who also "hallucinate" incorrect answers, usually "hallucinate" incorrect answers less when they "Think through this step by step before giving your answer", etc.


Yeah. These "lies" are just artifacts of the way that LLMs work. They're meant to predict likely text given a prompt. And they do. If tasked with "write some marketing or a buying guide for product X", they will simulate likely marketing blurbs, nothing yo do with truth, that's not their wheelhouse. Predictive is a very different function, algorithm and problem-set than something like "accurately summarize existing reviews". This is a feature, not a bug. If you use something off label, you'll get off label results. MSFT should know better.


I'm sorry but watching people talk about the vast majority of the AI landscape is like watching people talk about FSD. Have fun on the hype treadmill.


Why hasn't their stock plummeted like Google's?


Stop calling them hallucinations. If we're going to anthropomorphize AIs, let's just call it bullshitting and lies. If we're not going to anthropomorphize AIs, then we need a different term


> If we're going to anthropomorphize AIs, let's just call it bullshitting and lies.

why? "bullshitting and lies" suggests that the AI is intentionally being deceptive. "hallucinations" conveys the idea that the information is incorrect, but the AI perceives it to be correct, which is more in line with what is actually happening.


I'd go further and say that the AI doesn't even perceive it to be "correct". It's just saying these words are likely to follow those words.



That seems more technically accurate but less likely to catch on since it's a much less common word. I think my parents are a lot more likely to understand from context a news story that mentions that "lawyers relied on an AI that hallucinated court cases" than "lawyers relied on an AI that confabulated court cases".


I feel like "fabrication" is just sitting right there begging to be used.


Bullshitting and lies is what the humans selling the AI-powered services are doing. Hallucination, delusion and confabulation are what the AIs are doing (and some of the humans, too).


"Making shit up in order to fulfill some requirement" is the definition of lying, so whether it's a human or an AI, just making shit up in order to generate prompted output is flat out lying. Not "hallucinating". And the best part is that until LLM get valitidy checks baked in, even the things they get right are lies if presented with authority, because the LLM doesn't know whether it's true or not. In fact, the LLM doesn't know, full stop. It's still just a very well crafted autocomplete, and literally nothing more. So if we're going to anthropomorphise, call them what they'd be when humans do the same:

lies, and damned lies.


> "Making shit up in order to fulfill some requirement" is the definition of lying

I'd argue that there is an element of intent or agency involved. When a human makes things up intentionally or by choice, that is lying. When they do it unintentionally, that is not lying. It is usually called confabulation (or, honest lying - where the actor does not know they are not telling the truth). I don't think AIs/LLMs have agency or the ability to make things up intentionally. They are just doing what they are programmed to do and everything they produce looks the same to them. It is all true as far as the LLM is concerned. They might be confabulating, but I don't think they are lying.


The AI intentionally makes things up because that's literally the whole point of the LLM concept, both abstractly and concretely. WE MAKE IT LIE by conditioning it to lie. The "AI" part is not separate from the humans who made it, we are part of the system, and we made a computer that constantly and continuously lies in order to generate seemingly credible responses.

We made a computer that lies, all the time, about everything.

"...Why?"

<insert the shouting robot comic here>


Your argument is that any untruth is a lie.

Do you consider fiction authors to be liars?


No, my argument is that if you don't KNOW what truth is, everything you say is a lie.


If I make a claim based on prior knowledge and statistics I’ve learned over time, it’s not lying if it’s wrong. Lying has intent. Plenty of people say incorrect facts that they think are correct.

In second grade, my cousin talked a lot about flax farmers in South America, after learning about them in class. Turns out the lesson was on quinoa farmers, and he forgot the original produce and “hallucinated” the statistics about flax farmers instead. Technically the term is confabulation. Was he lying? No because he wasn’t trying to tell us fake facts.

LLMs have no intention of being wrong. Their “hallucinations” or whatever are just whatever makes sense from their statistical models. They’re really just confabulations.


"Bullshitting" seems like a good term for accurate or inaccurate responses.

Let's extend "LLMs have no intention of being wrong" to "LLMs have no inherent sense of being correct" - sometimes their predictions happen to be correct, sometimes they don't. But they're all hallucinations generated from the same process.


Bullshitting always requires a hidden intention to manipulate.


Nah, it can be just talking without much rigor or verification of fact or anything, often with a loose boundary between opinion and fact - the "to talk in an exaggerated or foolish manner" definition.

As in "my buddies and I were bullshitting about movies the other day."

ChatGPT definitely talks with an exaggerated manner confidence-wise.


Mislead is probably a more accurate term. Bullshit exists outside of the correct/incorrect binary and serves only to build narrative.


To be fair, if we are going to anthropomorphize it, bullshit and lies implies some sort of negative intent that I’m not sure the models have.

Bullshit is probably the closest, as people will bullshit for all sorts of reasons, but hallucinations is at least intent-neutral, which I think is the point.


A person can 100% believe in the lies they've been told, but that person is not hallucinating.

Take for example climate change deniers; apart from the corporations and the politicians that abuse scepticism to maintain their power and wealth, many of the most fervent deniers truly believe the nonsense they're saying.

Perhaps a more neutral term like "falsehoods" is applicable here.


I think calling it lying requires intent, I’d just say those people are wrong rather than lying


no not true - lazy, imperfect and damaged cognitive functions have similar results.


In classification problems there’s a useful term for something similar already — False Positives

   false positive (FP), Type I error
   A test result which wrongly indicates that a particular condition or attribute is present
https://en.m.wikipedia.org/wiki/Confusion_matrix

Edit — Though I’m not sure how well that fits for a LLM (it’s more a series of false positives at each step of prediction in the sequence).


In psychology, we've got a term which is almost 100% matching: confabulation. The only part which isn't correct is association with brain damage.

https://en.wikipedia.org/wiki/Confabulation

In psychology, confabulation is a memory error defined as the production of fabricated, distorted, or misinterpreted memories about oneself or the world. It is generally associated with certain types of brain damage (especially aneurysm in the anterior communicating artery) or a specific subset of dementias.


I think it's a good term for this but it also side steps the issue that this isn't a actually intelligence coming up with it. It's just machine noise


It doesn’t matter if it’s noise, it only needs to be useful. Confabulations are not only not useful, they’re actively harmful.


Call them confabulations.

"Confabulation refers to the production or creation of false or erroneous memories without the intent to deceive, sometimes called 'honest lying'"

"Confabulation is the creation of false memories in the absence of intentions of deception. Individuals who confabulate have no recognition that the information being relayed to others is fabricated. Confabulating individuals are not intentionally being deceptive and sincerely believe the information they are communicating to be genuine and accurate."

https://clinmedjournals.org/articles/ijnn/international-jour...


The words confabulation and lie have the same problem when applied to the current state of "AI": they embody are certain level of intent; in the former case, the implication is that there was no intent to deceive, while a lie is the opposite. Still, they both imply intent, and for as far as I know nobody has been able to conclusively demonstrate intent on the part of a chatbot.

Hallucination doesn't require intent.


I like "fabrication" because it correctly implies the data is being pulled from somewhere and assembled. It does not put a judgment on the initial data, nor on the assembled product.

It can take on a positive or negative meaning, depending on the context.

"ChatGPT fabricated an answer that was technically correct, but misleading," or "ChatGPT was able to fabricate an innovative solution that had eluded us."

Edit: And of course, everyone's favorite, "ChatGPT found guilty of fabricating case citations."

https://www.techspot.com/news/98860-chatgpt-found-guilty-fab...


Hallucinations are defined as “Perception of visual, auditory, tactile, olfactory, or gustatory experiences without an external stimulus and with a compelling sense of their reality, usually resulting from a mental disorder or as a response to a drug.”

That doesn’t sound like what AI/LLMs are doing, at all. There is no mental disorder or drugs causing then to output what we would consider to be false information. The machine is not perceiving anything without an external stimulus. Everything they generate is from the stimulus we have given it.


Given the euphemism "bug" substituting for "programming error" you'd be tempted to allow something similar for LLMs, but these are not errors, the output is by design.

There is no motive for truth, just the most likely output, even if the likeliness is low.


> There is no motive for truth

This also ignores the larger question that has been a known issue for at least 2,000 years: "Quid est veritas?"


You can substitute "accuracy" or "usefulness" for "truth".


It’s the adopted term. I don’t see why it HAS to be the absolute exact closest possible term to what it would be in a human or something.

It feels a bit like saying “stop calling it e-mail! It’s got nothing to do with real mail!”


Because people feel like they have nothing to add so instead of not adding anything they decide they have an issue with some minute detail that doesn't really matter and then start raising hell.


IMO this whole concept of "hallucinations" is a made up buzzword (in the context of AI) to distract from the fact that the companies who are writing/training these models know full well that what they spit out is just as likely bullshit as it is "correct".

Saying "we have no idea if it's going to spit out something accurate" doesn't sell.

"oh it's hallucinating, how cute" is an easier sell.


Then tell us what we should call these manifestations...

It's say to say stop calling it X, but then what are we supposed to call them?


A popular term in the LLM space is 'confabulation': https://community.openai.com/t/hallucination-vs-confabulatio...

It fits better than the alternatives I've seen proposed.


Malfunctions? Breaking? Errors? Poor reliability?


The problem I have with this is it seems too generic.

Shouldn't we try to categorize the types of errors at least somewhat?


Sure, and it's probably wise not to pick error categorizations that impute consciousness onto a thing that's 1) almost certainly not conscious, 2) behaves quite similarly to a conscious thing, and 3) might become conscious at some point in the near or distant future

Hallucinations are definitionally features of conscious experience. Pick a different word or make one up!


I guess to me hallucination doesn't necessarily imply consciousness.

"an experience involving the apparent perception of something not present"

I guess it depends on exactly how you define "experience" and "perception".

But yeah there are better words than hallucination that are even more generic and do work better.


We should call it Twitter.


Bullshitting has a goal, hallucinations are random, seems apt.


I'm not so concerned with that as I am with the fact that this isn't one. Article says

> they tend to make up fake information – errors called “hallucinations.”

Hallucinations are a certain kind of error. But what appears to have happened here is a _direct_ manipulation from Microsoft. Which is a risky play by them. It doesn't take much to erode trust. People tend to trust LLMs because they tend to get things right. But if people see a few things that they know is wrong, they will quickly stop trusting. If they see a few things as marketing, then they will very quickly stop trusting.

It's not a hallucination, it is a filter. Microsoft manipulated the output to prefer their own products and boy is that a risky strategy.


> Microsoft manipulated the output to prefer their own products and boy is that a risky strategy.

Makes me wonder how they plan to monetize these chatbots and if they won’t just fizzle out like voice assistants.

I don’t see how there won’t be concerns over asking a chatbot for the best pizza in town and receiving an answer like “Customers love the new Meat Lover’s Pizza from Pizza Hut! Brought to you by Pizza Hut… (list of pizza places here)”. Amazon couldn’t figure out how to make money off of Alexa, how are Chatbots any different.


Chatbots just seems like the marketing tool if we're being honest. I can't see any way to monetize them without destroying them. Even just having them could threaten their own existence (or make bloggers higher qualities if Google fucking decides to fix SEO...). LLMs on the other hand have plenty of use cases, though I think people are still way over hyped on them and aren't interested in how the boring stuff has good utility.


I wouldn't get this worked up about a simple term that keeps things understandable for a layperson, lest your head might explode once you see how people are anthropomorphizing some AI "companion" bots.


It's belief vs intent. Intent would be anthropomorphizing AI much more than belief and would denote a theory of mind. I'm not sure of a better term for things the model 'believes' to be true that are wrong. I think it's quite analogous given that the model then elaborates on the false belief in much the same way that humans appear to do with hallucinations.

Additionally belief does not mean human; for example animals can have beliefs, even very rudimentary animals. I think is more of a way of self-containing the entity and treating it as a black box.


I dont understand why you are so worked up about the term and i also dont understand how your characterization of it as bullshitting and lies is accurate in any way.


Ha! Yup, one of my friends who has been working with "transformer models" for years now told me "oh yeah, it's a bullshitter" when I tried my hand and got some truly bizarre, digressive, "addled", etc. output.

OTOH, it reminded me very much of my own mind (reinforced by ADHD, in my case).

This suggests to me, at least, that "the problem" isn't these models, per se. It's more like: these are probably only one module / layer in a system more similar to our brains. Just as scientists have identified distinct regions (more) involved in, say, language production, or (direct) visual perception, or etc., I'd suggest we've only just built the first substantially more practical / realistic hack / simulation (much like 3D game engines almost always use hacks - e.g., not even using the simple "Newtonian optics" model fully [i.e., "ray tracing"]) of a sort of language cortex. I'd further guess that it's going to take some maturation of a number of methods, technologies, etc. to realistically add more "cortices", but, I do think it's quite likely to happen in approx. the "decades" range...

Highly highly speculative - rather naively based on the way other technologies have developed and with a little basis in work I've done more directly in neurobio etc. No deep(er) reason / analysis, but, just my current very tentative hypothesis.


Not sure why you're being downvoted.

Are there other opinions about the cortex or module idea? Is there a fundamental problem with that idea I'm missing?


Hinton called them "confabulations" according to this:

https://www.technologyreview.com/2023/05/02/1072528/geoffrey...


I think "confabulation" is a way more accurate term, I wish it had stuck instead of "hallucination".

A hallucination is a problem with input. Confabulation is false output.

Confabulation is when a person mistakenly recalls details and tries to "fill in the blanks", without realizing what they are saying is untrue.


Maybe we could just call it babbling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: