Microsoft's AI shopping announcement contains hallucinations in the demo

cryptozeus · on July 28, 2023

Is it just me or does everyone trust AI opinions less and less ? Every time I ask it to find top 5 of something, I go and double check myself and almost always find it to be wrong. For example try searching for top 5 restaurants around me in bard. Some of them dont even exist lol and some are just random if you cross verify with actual popularity from yelp etc.

cubefox · on July 28, 2023

Using language models for location or time based things is not recommended, as this usually requires non-textual data. Better to use them for general knowledge questions, programming help, translation, or writing. Asking them to do any complex calculations (especially when they also require non-text raw data, like inflation in a given time period) is also futile.

CSSer · on July 28, 2023

> general knowledge questions, programming help, translation, or writing.

They get all of these wrong too. It's like some AI-specific variant of the Gell-man amnesia effect. It's usually right in the first sentence, but if you really know the answer, it's often either very debatable or completely wrong by the halfway mark of the paragraph. Meanwhile, the associated brand authority is problematic.

IshKebab · on July 28, 2023

They don't get them all wrong, and even when they are not 100% correct they're usually better than nothing.

For instance, I needed to write code to spawn a child process and communicate with it via stdin/stdout in C++. This is pretty easy in most modern languages but in C++ you have to call POSIX's dump process spawning dance pretty much with raw syscalls. fork, execve, etc.

Rather than googling all the syscalls I would need and how to arrange them I just asked ChatGPT to do it. I've done it before so it was much easier to verify than to start from scratch.

And it got it 90% right. The only bit it got wrong was to make a single pipe and connect it to both stdin and stdout, rather than one pipe for each. But that was easy to spot and fix.

AI - at least for programming - is an enormous time saver. Could easily increase productivity by 50% in some cases.

In 5 years I expect it to be as normal as using an IDE. There are still people that slow themselves down by using unintelligent editors, and they will probably continue to live in the 80s, but people that use tools to help them will expect to use Copilot or similar all the time.

cubefox · on July 28, 2023

> In 5 years I expect it to be as normal as using an IDE.

Five years seems too conservative. Five years ago we only had GPT-1, which only generated funny word salad with acceptable syntax. An AI like ChatGPT seemed unthinkable at the time. And ChatGPT came out only last year. In five years similarly radical changes could happen. Programmers might actually get replaced with AI. Sounds too radical? But ChatGPT also would have sounded too radical five years ago!

IshKebab · on July 28, 2023

It's silly to extrapolate breakthroughs.

cubefox · on July 28, 2023

AI breakthroughs happen at an increasing rate at least since AlexNet came out in 2012. Before that, "AI" was mostly OCR. The speed of progress is crazy. It doesn't look like a slowdown is ahead of us.

simplyluke · on July 28, 2023

Gell-man is exactly what I’ve been referencing in conversation recently. Any professional will gladly explain why their field is really much too nuanced and complex for LLMs to threaten in the near term before seamlessly explaining how close we are to all those other engineers/doctors/clerks being automated right away.

sublinear · on July 28, 2023

In my opinion, roles directly threatened by LLMs are coodinators, client managers, etc. People whose job depends entirely on "soft skills" to interact and giving vague summaries, status updates, and assigning tasks.

A chat bot that scans Jira, accepts phone calls, and runs scrums can't possibly be any less reliable than some of the people I've worked with.

cubefox · on July 28, 2023

GPT-4 outperforms average students in exams of several fields. There are a lot of benchmarks, and GPT-4 does mostly do very well as long as the field relies enough on declarative knowledge.

sorokod · on July 28, 2023

I get different answers every time to "what is the third element in the periodic table" from llama2.

I'll hold off actually using them for now.

cubefox · on July 28, 2023

That model is not state of the art. Even GPT-3.5 can answer this question.

sorokod · on July 28, 2023

On GPT 3.5

Q: "What is the seventy fourth element of the periodic table?"

A: "The seventy-fourth element of the periodic table is Rhenium..."

But this is really shooting fish in a barrel. Given the way LLMs work why would you expect them to provide factually correct text completion?

geoduck14 · on July 28, 2023

Llama2, when fine-tuned, can be better than GPT3.5

Source: a trusted coworker

dkjaudyeqooe · on July 28, 2023

It's just reality sinking in.

rvz · on July 28, 2023

Well it doesn’t surprise me since I have been saying this for a while that these LLMs hallucinate nonsense to the point where you end up triple checking whatever it outputs.

LLMs thrive in applications that involve creativity and non-serious applications mostly around fantasy or creative writing. Anyone using them seriously outside of summarization for high risk use cases is going to be very disappointed.

nickpeterson · on July 28, 2023

Perhaps the outcome is we get better at actually checking things, not a terrible result.

geoduck14 · on July 28, 2023

I recommend LLM users leverage the RAG technique

rblatz · on July 28, 2023

I'm glad that expectations are shifting. At the extremes, it's either a fancy parlor trick or a hyper-intelligent god. A lot of the original hype has skewed much closer to the hyper-intelligent god side of the spectrum. It's definitely not a fancy parlor trick, but it's likely closer to that than the other side it's being hyped as.

hotpotamus · on July 28, 2023

I think the most amusing comment I've read here in the last few weeks called it "demented Clippy".

TheCaptain4815 · on July 28, 2023

My trust factor for online opinion is ranked:

1) Online forums (adding 'reddit' or 'hacker news' to a search query) 2) GPT4 3) Google search

phyzome · on July 28, 2023

There is information, here, in the observations that all these "AI" demos contain blatant inaccuracies, with apparently no fact-checking having taken place. It's clear that these companies (Microsoft, Google, OpenAI) do not care about accuracy, correctness, or the truth. It is not part of their business model.

There is no respect for your time, your safety, your reputation. Your role as a customer is to be conned into using the products for long enough that a return on investment can be made; the companies will pivot to a new product as soon as the untrustworthiness of the old one becomes common knowledge.

Short-term thinking. Desperation.

imchillyb · on July 28, 2023

A hallucination is an unexpected emergence.

The 'making up' facts, because it cannot determine a fact from fiction, is entirely expected behavior.

There is no 'hallucination' as the behavior is anticipated, expected, and entirely within normal operations processes.

The bullshit comes from there being no model of trust these AIs subscribe to. I'd love-love-love to see these AI producers be held to some responsibility to verification of truth and ethics.

These companies/universities/groups allowing their applications to bold-face-lie (misrepresent data with authority) to citizens should be top-priority to bash-in-the-face by legislators around the world.

12_throw_away · on July 28, 2023

> There is no 'hallucination' as the behavior is anticipated, expected, and entirely within normal operations processes.

Exactly. These are models that predict text sequences. These sequences often semantically express falsehoods, but the model's not "lying", it's not "hallucinating", and it's definitely not malfunctioning. It's doing exactly what it was designed to do.

There definitely are "lies" and "hallucinations" here though ... but they're coming from the hype-cycle-hucksters trying to convince us that this whole process somehow resembles "intelligence".

circuit10 · on July 28, 2023

It clearly has some level of intelligence, though it’s pretty far from human level. The hallucinations don’t make it less intelligent because it’s not “trying” to avoid them, as you seem to know already

12_throw_away · on July 29, 2023

> It clearly has some level of intelligence

Absolutely not, this is not remotely "clear", and it's a very strange thing to assert.

> The hallucinations don’t make it less intelligent because it’s not “trying” to avoid them, as you seem to know already

What? No. What does "as you seem to know already" mean in this context?

circuit10 · on July 29, 2023

> Absolutely not, this is not remotely "clear", and it's a very strange thing to assert.

I guess it depends how you define intelligence but I guess I would say intelligence is the ability to find the best action to take to achieve a certain goal, and AI can do that reasonably well

> What does "as you seem to know already" mean in this context?

It means that based on the comment I was replying to the person seems to already understand what I just said

hmcq6 · on July 28, 2023

> It clearly has some level of intelligence

https://plato.stanford.edu/entries/chinese-room/#LargPhilIss...

circuit10 · on July 28, 2023

To me the Chinese Room thought experiment seems like it's meant to show that AIs can be intelligent, not the opposite?

"Searle could receive Chinese characters through a slot in the door, process them according to the program's instructions, and produce Chinese characters as output, without understanding any of the content of the Chinese writing."

Sure, but that doesn't mean the state of the program doesn't contain any understanding or intelligence, it's just that the human doesn't have a high-level view that can be used to decode that internal state. We're not asking whether the computer chip itself understands things but whether the something contained in the program running on it does. The human could also run a physics simulation as in https://xkcd.com/505/ and recreate a human brain which would be no different to a physical brain in terms of behavior and so there would be no reason not to call it intelligent

hmcq6 · on July 29, 2023

You're misunderstanding the thought experiment then. By definition the person inside the Chinese Room doesn't understand Chinese.

> but that doesn't mean the state of the program doesn't contain any understanding or intelligence

Programs don't contain understanding or intelligence, they contain instructions.

> We're not asking whether the computer chip itself understands things but whether the something contained in the program running on it does.

I feel like your saying "I'm not accusing the blender of being intelligent, I'm saying the recipe for this margarita is self aware." It doesn't matter if its hardware or software, neither is capable of understanding because understanding is a conscious experience and neither a blender nor a recipe are sentient.

> The human could also run a physics simulation

Cool XKCD but I'm not arguing about wether AI is possible. Just pointing out that convolutional neural networks are not self aware or intelligent or actually learning (at least not yet).

circuit10 · on July 29, 2023

> “You're misunderstanding the thought experiment then.”

So if I don’t agree with it, I’m misunderstanding it? It even says in the Wikipedia article for it:

> "The overwhelming majority", notes BBS editor Stevan Harnad, "still think that the Chinese Room Argument is dead wrong".

So don’t try to pretend it’s some absolute truth, it’s just a flawed argument

> Programs don't contain understanding or intelligence, they contain instructions.

Why can intelligence and understanding not come from a sufficiently complex set of instructions?

> understanding is a conscious experience and neither a blender nor a recipe are sentient.

That’s an odd definition of understanding. By my definition understanding is having information about something and the ability to process it such that you can effectively predict its behaviour and possibly take actions to change its state to fit a goal. I guess you will always win if you redefine all the words to mean what you want. Your definition is useless because it’s unfalsifiable because you can’t measure whether something is “sentient”

> Just pointing out that convolutional neural networks are not self aware or intelligent or actually learning

Self aware? Probably no

Intelligent? To some extent, yes

Learning? Of course they are, I don’t see how you can argue that they aren’t

hmcq6 · on July 29, 2023

> So if I don’t agree with it, I’m misunderstanding it?

Now you're misunderstanding me. I'm not saying you're not allowed to have a different opinion on the full thought experiment. You're assuming intelligence in the setup of the thought experiment and that is objectively not how it is meant to be interpreted.

> Why can intelligence and understanding not come from a sufficiently complex set of instructions?

Again I didn't say it can't just that convolutional neural networks as they currently exist are not that complex. It's a fancy Markov chain.

> I guess you will always win if you redefine all the words to mean what you want.

You say directly after making up your own definition of intelligence. I'm not interested in discussing your definition of intelligence or the definition of intelligence, I'm talking about this specific application of technology and if it meets a common definition of intelligence. Please point to a dictionary definition if you wanna continue this back and forth

> Learning? Of course they are, I don’t see how you can argue that they aren’t

Because learning has a definition. Theres a reason AI researchers call it "Training" and not "Teaching"

circuit10 · on July 29, 2023

> You say directly after making up your own definition of intelligence.

I guess but I think the one I’m using is more common and useful. The Google dictionary says “the ability to acquire and apply knowledge and skills” which is closer to mine (having knowledge and the ability to apply it) than yours (some abstract idea of consciousness that can’t be measured)

> Theres a reason AI researchers call it "Training" and not "Teaching"

They also call it machine learning

hmcq6 · on July 29, 2023

> the ability to acquire and apply knowledge and skills

Ok but what is knowledge? You need to follow that rabbit hole. Knowledge isn't just data. You'll find that knowledge is frequently defined with some tie in to experience and the definition of experience is tied to consciousness.

> They also call it machine learning

They have called the field Artificial Intelligence (or ML) since 1956 but that doesn't mean they had an example of an instance of artificial intelligence. It's just the name of the field. I've never heard of a researcher referring to the act of training as "machine learning" though, just the field.

cubefox · on July 28, 2023

Speaking with GPT-4, it is hard to deny the conjecture that its weights encode an internal world model somewhere.

If so, the difficulty is not that the model has no conception of truth and falsity, it is rather to motivate the model to tell the truth. Or more precisely, to let the model be honest, to only tell things it believes to be true, things which are part of its world model.

Unfortunately, we can't just tell the model to be honest, since we can't distinguish between responses the model does or does not believe to be true. With RLHF fine-tuning, we can train the model to tend to give answers the human raters believe to be true. But we want the model to tell what it believes to be true, not what it believes that we believe is true!

For example, human raters may overwhelmingly rate response X as false, but the model, having read the entire Internet, may have come to the conclusion that X is true. So RLHF would train it to lie about X, to answer not-X instead of X.

This problem could turn out to be fatal when a model becomes significantly smarter than humans, because this means it would less often believe according to human biases and misconceptions, so it would learn to be deceptive and to tell us only what we want to believe. This could have frightening consequences if this leads it to conceal any of its possible misalignments with human values from us.

DanHulton · on July 28, 2023

It is, like you said, conjecture. The best we can say is that it _usually_ provides responses that are _consistent_ with responses coming from an intelligence with an internal world model. That doesn't mean that's the only way to get those responses, nor does it mean that this is necessarily what's happening in this case.

So saying things like "the model has come to the conclusion that" or "smarter than", or "learns to be deceptive", I think that's premature at best. I'm not yet convinced that there's sufficient evidence to show appreciable internal state and logical processes. There's so, so many examples where what looks like legit understanding breaks down with the slightest tweak to the prompt, and it goes from looking like a savant to someone high on just a tremendous amount of LSD.

If there was an internal world model that just wasn't correct, I would expect to see its incorrect answers be at least logically consistent, but instead it looks way, way more like the trick just doesn't work for this case.

So to get back to the original point, this is MS trying to leverage this trick to do a task that requires actual logical reasoning, factual evaluation, and internal world state, and we're just not there. (I hesitate to use the word "yet", because there's still a lot of not-yet-conclusive discussion around whether current LLM techniques will ever get us "there." Colour me tentatively pessimistic in the meantime. =) )

tremon · on July 28, 2023

because it cannot determine a fact from fiction

This is way too narrow. Even if it were able to determine fact from fiction, a neural network would still be able to hallucinate as long as it has no ontology: if it doesn't "know" the boundary between objects it has no way of knowing the atomicity of its facts, so it will inevitably combine even known "facts" into falsehoods.

To illustrate, the following fact-based syllogism would sound perfectly valid in the absence of a working ontology:

  A: That green flask costs $10
  B: This flask is green
  => This flask costs $10

JieJie · on July 29, 2023

"Lies are attempts to hide the truth by willfully denying facts. Fiction, on the other hand, is an attempt to reveal the truth by ignoring facts. — John Green

pmontra · on July 28, 2023

Bing works for Microsoft and basically that's an ad. Wouldn't any human paid by Microsoft say in an ad that Surface Headphones 2 are the best ANC headphones?

aeirjtaweraew · on July 28, 2023

Pretty soon some LLM owner is going to use the argument "Everyone is allowed to have their own opinions, and LLMs are too, their responses don't have to line up with someone else's preferences."

jarofghosts · on July 28, 2023

Alternative Intelligence

siva7 · on July 28, 2023

Opinion pieces like shopping recommendations are quite hard for current LLMs. Either it is a hard fact - or pure creative work - that's where AI shines. Anything between and things get tricky

2bitencryption · on July 28, 2023

This is one of those areas where the poor quality of the data influences the output, I think.

There are so many garbage, lazily written product reviews, by websites that only exist to get people to click affiliate links. These sites only have one goal, which is to get you to click an affiliate link and make a purchase. So it is not in their best interest to say "You shouldn't buy this."

Rather, they make a list of "top X Foobars", they start with a really expensive one, then they follow with a more reasonably-priced one, and give it a very positive review. It leads to clicks and purchases.

Given this, it's not surprising to me that even the best LLMs carry pieces of this with them. Ask it to predict text describing some tech product on a sales page, and of course parts of that low-quality data will bleed through.

cubefox · on July 28, 2023

There is an argument to be made for automatically downweighting (be it training epochs or pagerank rating) anything with affiliate links. But I guess it would be trivial to hide them behind a redirect.

That being said, I recently asked the Bing chatbot about the difference between two similar sounding printer models, and it gave a good explanation which I previously couldn't quickly find via Google. In case of Bing it is sometimes not completely clear to which degree its answer depends on the Web search, if it performed one, and to which degree it is just answering from its background knowledge (which could be prone to hallucination, but is less "gullible", so to speak). It provides sources, but not everything it says is necessarily present in the source. I'm actually surprised how quickly Bing is able to search (load and read) multiple websites, given that the loading times are not always trivial. It turns out they are much faster at reading than at typing. Indeed, each forward pass reads the entire context window, so once for every generated token!

nowooski · on July 28, 2023

For sure. The garbage in, garbage out problem is quite real for ecommerce applications.

sporadicallyjoe · on July 28, 2023

Is anyone shipping AI products that DO NOT contain hallucinations? I thought that was pretty much a given.

thewataccount · on July 28, 2023

Well there isn't a human that never "hallucinates" in meaning we use for LLMs aka gives "incorrect answers" confidently.

Human's brains use lots of heuristics - we don't "think step by step" through everything - instead we rapidly construct an answer for almost everything.

What we say is "hallucinations" for AI in humans is "misspeaking, misremembering anything, off by 1 math/counting, missidentifying someone, using the wrong variable/method when programming, etc."

jarofghosts · on July 28, 2023

Hallucinating is roughly how they work, we just label it as such when it's something obviously weird

thewataccount · on July 28, 2023

This is something I'm not sure people understand.

LLM's only make a "best guess" for each next token. That's it. When it's wrong we call it a "hallucination" but really the entire thing was a "hallucination" to begin with.

This is also analogous to humans - who also "hallucinate" incorrect answers, usually "hallucinate" incorrect answers less when they "Think through this step by step before giving your answer", etc.

yonatron · on July 28, 2023

Yeah. These "lies" are just artifacts of the way that LLMs work. They're meant to predict likely text given a prompt. And they do. If tasked with "write some marketing or a buying guide for product X", they will simulate likely marketing blurbs, nothing yo do with truth, that's not their wheelhouse. Predictive is a very different function, algorithm and problem-set than something like "accurately summarize existing reviews". This is a feature, not a bug. If you use something off label, you'll get off label results. MSFT should know better.

predictabl3 · on July 28, 2023

I'm sorry but watching people talk about the vast majority of the AI landscape is like watching people talk about FSD. Have fun on the hype treadmill.

fizwhiz · on July 28, 2023

Why hasn't their stock plummeted like Google's?

barbariangrunge · on July 28, 2023

Stop calling them hallucinations. If we're going to anthropomorphize AIs, let's just call it bullshitting and lies. If we're not going to anthropomorphize AIs, then we need a different term

BaculumMeumEst · on July 28, 2023

> If we're going to anthropomorphize AIs, let's just call it bullshitting and lies.

why? "bullshitting and lies" suggests that the AI is intentionally being deceptive. "hallucinations" conveys the idea that the information is incorrect, but the AI perceives it to be correct, which is more in line with what is actually happening.

nightski · on July 28, 2023

I'd go further and say that the AI doesn't even perceive it to be "correct". It's just saying these words are likely to follow those words.

meindnoch · on July 28, 2023

https://en.m.wikipedia.org/wiki/Confabulation

BaculumMeumEst · on July 28, 2023

That seems more technically accurate but less likely to catch on since it's a much less common word. I think my parents are a lot more likely to understand from context a news story that mentions that "lawyers relied on an AI that hallucinated court cases" than "lawyers relied on an AI that confabulated court cases".

JieJie · on July 29, 2023

I feel like "fabrication" is just sitting right there begging to be used.

wtallis · on July 28, 2023

Bullshitting and lies is what the humans selling the AI-powered services are doing. Hallucination, delusion and confabulation are what the AIs are doing (and some of the humans, too).

TheRealPomax · on July 28, 2023

"Making shit up in order to fulfill some requirement" is the definition of lying, so whether it's a human or an AI, just making shit up in order to generate prompted output is flat out lying. Not "hallucinating". And the best part is that until LLM get valitidy checks baked in, even the things they get right are lies if presented with authority, because the LLM doesn't know whether it's true or not. In fact, the LLM doesn't know, full stop. It's still just a very well crafted autocomplete, and literally nothing more. So if we're going to anthropomorphise, call them what they'd be when humans do the same:

lies, and damned lies.

irrational · on July 28, 2023

> "Making shit up in order to fulfill some requirement" is the definition of lying

I'd argue that there is an element of intent or agency involved. When a human makes things up intentionally or by choice, that is lying. When they do it unintentionally, that is not lying. It is usually called confabulation (or, honest lying - where the actor does not know they are not telling the truth). I don't think AIs/LLMs have agency or the ability to make things up intentionally. They are just doing what they are programmed to do and everything they produce looks the same to them. It is all true as far as the LLM is concerned. They might be confabulating, but I don't think they are lying.

TheRealPomax · on July 28, 2023

The AI intentionally makes things up because that's literally the whole point of the LLM concept, both abstractly and concretely. WE MAKE IT LIE by conditioning it to lie. The "AI" part is not separate from the humans who made it, we are part of the system, and we made a computer that constantly and continuously lies in order to generate seemingly credible responses.

We made a computer that lies, all the time, about everything.

"...Why?"

overnight5349 · on July 28, 2023

Your argument is that any untruth is a lie.

Do you consider fiction authors to be liars?

TheRealPomax · on July 28, 2023

No, my argument is that if you don't KNOW what truth is, everything you say is a lie.

vineyardmike · on July 28, 2023

If I make a claim based on prior knowledge and statistics I’ve learned over time, it’s not lying if it’s wrong. Lying has intent. Plenty of people say incorrect facts that they think are correct.

In second grade, my cousin talked a lot about flax farmers in South America, after learning about them in class. Turns out the lesson was on quinoa farmers, and he forgot the original produce and “hallucinated” the statistics about flax farmers instead. Technically the term is confabulation. Was he lying? No because he wasn’t trying to tell us fake facts.

LLMs have no intention of being wrong. Their “hallucinations” or whatever are just whatever makes sense from their statistical models. They’re really just confabulations.

majormajor · on July 28, 2023

"Bullshitting" seems like a good term for accurate or inaccurate responses.

Let's extend "LLMs have no intention of being wrong" to "LLMs have no inherent sense of being correct" - sometimes their predictions happen to be correct, sometimes they don't. But they're all hallucinations generated from the same process.

visarga · on July 28, 2023

Bullshitting always requires a hidden intention to manipulate.

majormajor · on July 28, 2023

Nah, it can be just talking without much rigor or verification of fact or anything, often with a loose boundary between opinion and fact - the "to talk in an exaggerated or foolish manner" definition.

As in "my buddies and I were bullshitting about movies the other day."

ChatGPT definitely talks with an exaggerated manner confidence-wise.

uoaei · on July 28, 2023

Mislead is probably a more accurate term. Bullshit exists outside of the correct/incorrect binary and serves only to build narrative.

spott · on July 28, 2023

To be fair, if we are going to anthropomorphize it, bullshit and lies implies some sort of negative intent that I’m not sure the models have.

Bullshit is probably the closest, as people will bullshit for all sorts of reasons, but hallucinations is at least intent-neutral, which I think is the point.

jeroenhd · on July 28, 2023

A person can 100% believe in the lies they've been told, but that person is not hallucinating.

Take for example climate change deniers; apart from the corporations and the politicians that abuse scepticism to maintain their power and wealth, many of the most fervent deniers truly believe the nonsense they're saying.

Perhaps a more neutral term like "falsehoods" is applicable here.

circuit10 · on July 28, 2023

I think calling it lying requires intent, I’d just say those people are wrong rather than lying

mistrial9 · on July 28, 2023

no not true - lazy, imperfect and damaged cognitive functions have similar results.

dijksterhuis · on July 28, 2023

In classification problems there’s a useful term for something similar already — False Positives…

   false positive (FP), Type I error
   A test result which wrongly indicates that a particular condition or attribute is present

https://en.m.wikipedia.org/wiki/Confusion_matrix

Edit — Though I’m not sure how well that fits for a LLM (it’s more a series of false positives at each step of prediction in the sequence).

baq · on July 28, 2023

In psychology, we've got a term which is almost 100% matching: confabulation. The only part which isn't correct is association with brain damage.

https://en.wikipedia.org/wiki/Confabulation

In psychology, confabulation is a memory error defined as the production of fabricated, distorted, or misinterpreted memories about oneself or the world. It is generally associated with certain types of brain damage (especially aneurysm in the anterior communicating artery) or a specific subset of dementias.

NeuroCoder · on July 28, 2023

I think it's a good term for this but it also side steps the issue that this isn't a actually intelligence coming up with it. It's just machine noise

baq · on July 28, 2023

It doesn’t matter if it’s noise, it only needs to be useful. Confabulations are not only not useful, they’re actively harmful.

irrational · on July 28, 2023

Call them confabulations.

"Confabulation refers to the production or creation of false or erroneous memories without the intent to deceive, sometimes called 'honest lying'"

"Confabulation is the creation of false memories in the absence of intentions of deception. Individuals who confabulate have no recognition that the information being relayed to others is fabricated. Confabulating individuals are not intentionally being deceptive and sincerely believe the information they are communicating to be genuine and accurate."

https://clinmedjournals.org/articles/ijnn/international-jour...

tremon · on July 28, 2023

The words confabulation and lie have the same problem when applied to the current state of "AI": they embody are certain level of intent; in the former case, the implication is that there was no intent to deceive, while a lie is the opposite. Still, they both imply intent, and for as far as I know nobody has been able to conclusively demonstrate intent on the part of a chatbot.

Hallucination doesn't require intent.

JieJie · on July 29, 2023

I like "fabrication" because it correctly implies the data is being pulled from somewhere and assembled. It does not put a judgment on the initial data, nor on the assembled product.

It can take on a positive or negative meaning, depending on the context.

"ChatGPT fabricated an answer that was technically correct, but misleading," or "ChatGPT was able to fabricate an innovative solution that had eluded us."

Edit: And of course, everyone's favorite, "ChatGPT found guilty of fabricating case citations."

https://www.techspot.com/news/98860-chatgpt-found-guilty-fab...

irrational · on July 28, 2023

Hallucinations are defined as “Perception of visual, auditory, tactile, olfactory, or gustatory experiences without an external stimulus and with a compelling sense of their reality, usually resulting from a mental disorder or as a response to a drug.”

That doesn’t sound like what AI/LLMs are doing, at all. There is no mental disorder or drugs causing then to output what we would consider to be false information. The machine is not perceiving anything without an external stimulus. Everything they generate is from the stimulus we have given it.

dkjaudyeqooe · on July 28, 2023

Given the euphemism "bug" substituting for "programming error" you'd be tempted to allow something similar for LLMs, but these are not errors, the output is by design.

There is no motive for truth, just the most likely output, even if the likeliness is low.

IKantRead · on July 28, 2023

> There is no motive for truth

This also ignores the larger question that has been a known issue for at least 2,000 years: "Quid est veritas?"

dkjaudyeqooe · on July 28, 2023

You can substitute "accuracy" or "usefulness" for "truth".

scrollaway · on July 28, 2023

It’s the adopted term. I don’t see why it HAS to be the absolute exact closest possible term to what it would be in a human or something.

It feels a bit like saying “stop calling it e-mail! It’s got nothing to do with real mail!”

chankstein38 · on July 28, 2023

Because people feel like they have nothing to add so instead of not adding anything they decide they have an issue with some minute detail that doesn't really matter and then start raising hell.

lp0_on_fire · on July 28, 2023

IMO this whole concept of "hallucinations" is a made up buzzword (in the context of AI) to distract from the fact that the companies who are writing/training these models know full well that what they spit out is just as likely bullshit as it is "correct".

Saying "we have no idea if it's going to spit out something accurate" doesn't sell.

"oh it's hallucinating, how cute" is an easier sell.

SirMaster · on July 28, 2023

Then tell us what we should call these manifestations...

It's say to say stop calling it X, but then what are we supposed to call them?

PretzelPirate · on July 28, 2023

A popular term in the LLM space is 'confabulation': https://community.openai.com/t/hallucination-vs-confabulatio...

It fits better than the alternatives I've seen proposed.

llamaimperative · on July 28, 2023

Malfunctions? Breaking? Errors? Poor reliability?

SirMaster · on July 28, 2023

The problem I have with this is it seems too generic.

Shouldn't we try to categorize the types of errors at least somewhat?

llamaimperative · on July 28, 2023

Sure, and it's probably wise not to pick error categorizations that impute consciousness onto a thing that's 1) almost certainly not conscious, 2) behaves quite similarly to a conscious thing, and 3) might become conscious at some point in the near or distant future

Hallucinations are definitionally features of conscious experience. Pick a different word or make one up!

SirMaster · on July 28, 2023

I guess to me hallucination doesn't necessarily imply consciousness.

"an experience involving the apparent perception of something not present"

I guess it depends on exactly how you define "experience" and "perception".

But yeah there are better words than hallucination that are even more generic and do work better.

otikik · on July 28, 2023

We should call it Twitter.

ilyt · on July 28, 2023

Bullshitting has a goal, hallucinations are random, seems apt.

godelski · on July 28, 2023

I'm not so concerned with that as I am with the fact that this isn't one. Article says

> they tend to make up fake information – errors called “hallucinations.”

Hallucinations are a certain kind of error. But what appears to have happened here is a _direct_ manipulation from Microsoft. Which is a risky play by them. It doesn't take much to erode trust. People tend to trust LLMs because they tend to get things right. But if people see a few things that they know is wrong, they will quickly stop trusting. If they see a few things as marketing, then they will very quickly stop trusting.

It's not a hallucination, it is a filter. Microsoft manipulated the output to prefer their own products and boy is that a risky strategy.

Cagrosso · on July 28, 2023

> Microsoft manipulated the output to prefer their own products and boy is that a risky strategy.

Makes me wonder how they plan to monetize these chatbots and if they won’t just fizzle out like voice assistants.

I don’t see how there won’t be concerns over asking a chatbot for the best pizza in town and receiving an answer like “Customers love the new Meat Lover’s Pizza from Pizza Hut! Brought to you by Pizza Hut… (list of pizza places here)”. Amazon couldn’t figure out how to make money off of Alexa, how are Chatbots any different.

godelski · on July 28, 2023

Chatbots just seems like the marketing tool if we're being honest. I can't see any way to monetize them without destroying them. Even just having them could threaten their own existence (or make bloggers higher qualities if Google fucking decides to fix SEO...). LLMs on the other hand have plenty of use cases, though I think people are still way over hyped on them and aren't interested in how the boring stuff has good utility.

shlubbert · on July 28, 2023

I wouldn't get this worked up about a simple term that keeps things understandable for a layperson, lest your head might explode once you see how people are anthropomorphizing some AI "companion" bots.

cjbgkagh · on July 28, 2023

It's belief vs intent. Intent would be anthropomorphizing AI much more than belief and would denote a theory of mind. I'm not sure of a better term for things the model 'believes' to be true that are wrong. I think it's quite analogous given that the model then elaborates on the false belief in much the same way that humans appear to do with hallucinations.

Additionally belief does not mean human; for example animals can have beliefs, even very rudimentary animals. I think is more of a way of self-containing the entity and treating it as a black box.

brigadier132 · on July 28, 2023

I dont understand why you are so worked up about the term and i also dont understand how your characterization of it as bullshitting and lies is accurate in any way.

tiffanyg · on July 28, 2023

Ha! Yup, one of my friends who has been working with "transformer models" for years now told me "oh yeah, it's a bullshitter" when I tried my hand and got some truly bizarre, digressive, "addled", etc. output.

OTOH, it reminded me very much of my own mind (reinforced by ADHD, in my case).

This suggests to me, at least, that "the problem" isn't these models, per se. It's more like: these are probably only one module / layer in a system more similar to our brains. Just as scientists have identified distinct regions (more) involved in, say, language production, or (direct) visual perception, or etc., I'd suggest we've only just built the first substantially more practical / realistic hack / simulation (much like 3D game engines almost always use hacks - e.g., not even using the simple "Newtonian optics" model fully [i.e., "ray tracing"]) of a sort of language cortex. I'd further guess that it's going to take some maturation of a number of methods, technologies, etc. to realistically add more "cortices", but, I do think it's quite likely to happen in approx. the "decades" range...

Highly highly speculative - rather naively based on the way other technologies have developed and with a little basis in work I've done more directly in neurobio etc. No deep(er) reason / analysis, but, just my current very tentative hypothesis.

apomekhanes · on July 28, 2023

Not sure why you're being downvoted.

Are there other opinions about the cortex or module idea? Is there a fundamental problem with that idea I'm missing?

joker_minmax · on July 28, 2023

Hinton called them "confabulations" according to this:

https://www.technologyreview.com/2023/05/02/1072528/geoffrey...

unqueued · on July 28, 2023

I think "confabulation" is a way more accurate term, I wish it had stuck instead of "hallucination".

A hallucination is a problem with input. Confabulation is false output.

Confabulation is when a person mistakenly recalls details and tries to "fill in the blanks", without realizing what they are saying is untrue.

batch12 · on July 28, 2023

Maybe we could just call it babbling.