> Understanding what kind of tasks LLMs can and cannot reliably solve remains in...

sigmoid10 · on Sept 26, 2024

It's always funny to read these stories if you know how ChatGPT actually works. Because if you know about tokenization, you know why this is definitely not a good job for ChatGPT. Exactly the same reason why it can't spell STRAWBERRY. Not because it doesn't understand the concept of fruits or vegetables or because it doesn't understand sophisticated concepts like metaphors or memes. It's not a good job for it because it doesn't see text in the way you see it. You see the world "hello" made up of individual characters, but the model sees it as a single token (the token with id 24912 for gpt-4o to be precise). It knows the meaning of this token and it's relationship to other tokens much the same way you know relations between words. But it has fundamentally no clue about the characters that make up this word (unless someone trained it to do so or by using spurious additional relations that might exist in the training data).

devsda · on Sept 26, 2024

> but the model sees it as a single token (the token with id 24912 for gpt-4o to be precise). It knows the meaning of this token and it's relationship to other tokens much the same way you know relations between words

In this context, if we assume that Deep Thought from Hitchhiker's Guide is an LLM, then the answer to everything[1] i.e. 42 makes sense. 42 is just the token id !

1. https://en.m.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%...

kridsdale3 · on Sept 26, 2024

Oh my god, you cracked it. After all these years..

spherelot · on Sept 26, 2024

> But it has fundamentally no clue about the characters that make up this word (unless someone trained it to do so or by using spurious additional relations that might exist in the training data).

That was my theory as well when I first saw the strawberry test. However, it is easy test if they know how to spell.

The most obvious is:

> Can you spell "It is wonderful weather outside. I should go out and play.". Use capital letters, and separate each letter with a space.

The free tier ChatGPT model is smart enough to understand the following instructions as well which shows that its not just the simple words:

> I was wondering if you can spell. When I ask you a question, answer me with capital letters, and separate each word with a space. When there is real space between the letters, insert character '--' there, so the output is easier to read. Tell me how the attention mechanism works in the modern transformer language models.

Also somebody pointed out in some other HN thread that the modern LLMs are perfect for dyslexic people, because you can typo every single word and the model still understands you perfectly. Not sure how true this is, but at least a simple example seems to work:

> Hlelo, how aer you diong. Cna you undrestnad me?

It would be interesting to know if the datasets actually include spelling examples, or if the models learn how to spell form the massive amount of spelling mistakes in the datasets.

int_19h · on Sept 26, 2024

They can do this kind of thing, but in my experience, that makes the model feel "dumber" as far as quality of output goes (unless you make it produce normal output first before having it convert it to something else).

btown · on Sept 26, 2024

I wonder if there's research being done on training LLMs with extended data in analogy to the "kernel trick" for SVMs: the same way one might feed (x, x^2, x^3) rather than just x, and thus make a linear model able to reason about a nonlinear boundary, should we be feeding multimodal LLMs with not only a token-by-token but also character-by-character and pixel-by-pixel representation of prompt texts during training and inference? Or, allow them to "circle back" and request they be given that as subsequent input text, if they detect that it's relevant information for them? There's likely a lot of interesting work here.

alister · on Sept 26, 2024

> if you know about tokenization, you know why this is definitely not a good job for ChatGPT

Why are other LLMs able to do it? (Other comments show images successfully generated with grok and flux.1)

sigmoid10 · on Sept 26, 2024

You can train the model to do these things in the same way you can train a blind person to describe the colors of objects. But if you put them in an unknown environment or give them a texture they've never encountered before, they will have no idea how to perceive its color. This is a fundamental problem for LLMs and won't change until someone invents a method that gets rid of tokenization for good.

dools · on Sept 26, 2024

This reminds me of how the alien brain species in Futurama went about gathering all the facts in the Universe.

"Beavers mate for life, 11 > 4"

marcellus23 · on Sept 26, 2024

> the token with id 24912 for gpt-4o to be precise

How do you find this out?

conesus · on Sept 26, 2024

GPT-4o uses BPE (byte pair encoding). They released `tiktoken` which allows you to count tokens in strings in python.

    pip install tiktoken
    >>> import tiktoken
    >>> encoding = tiktoken.encoding_for_model("gpt-4o")
    >>> print(encoding.encode("hello marcellus"))
    [24912, 2674, 10936, 385]

MarioMan · on Sept 26, 2024

OpenAI provides a tokenizer tool: https://platform.openai.com/tokenizer

netcan · on Sept 26, 2024

To hazard a silly question...

Why can't gpt pick up a non-fundamental "understanding" of letters and spelling from the data?

I mean... I do think "see letters" either when speaking/hearing, but I do know how to pull up those letters when necessary.

topoftheforts · on Sept 26, 2024

You can do that with flux.1, it's the best image model right now as far as I'm aware, especially for dealing with text.

This is the result for "vegetables spelling out the word "HELLO"" I used flux-pro on Replicate https://ibb.co/1RVKmdk

red1reaper · on Sept 26, 2024

Wow, it used very intersting vegetables, the very common E shapped cucumber and of course, the commonplace O shapped Tomatoe with a void in the middle. Very usual vegetables, can buy them in any grocery store.

ziddoap · on Sept 26, 2024

Sometimes I wonder if anything can actually impress anyone on this site.

Words have plagued image gen since the start. Now there is an image model that, with an extremely simple prompt, does an awesome job with words.

If they expanded their prompt and played with a few seeds until an image with perfectly realistic vegetables were generated, I wonder what the next complaint would be.

Workaccount2 · on Sept 26, 2024

People are threatened. They are not going to celebrate the thing that makes them irrelevant. They are going to talk shit about it and downplay it.

longtimelistnr · on Sept 26, 2024

im not sure if threatened is correct, i think it's more a collective "why?" And there hasn't been a particuarly convincing answer

TillE · on Sept 26, 2024

Right, tons of gen AI stuff is super impressive. It's really cool that we can do all this stuff. In this thread we're talking about a fun toy.

But the actual practical applications are like, small useful tools. There's no real sign we're heading for a world-changing trillion dollar industry.

thaumasiotes · on Sept 26, 2024

When I see the prompt "the word HELLO spelled out in vegetables", I expect realistic vegetables being assembled into the appropriate shapes, such that e.g. the O is made of many different vegetables arranged in a circle.

I don't expect imaginary nightmare vegetables.

dools · on Sept 26, 2024

Nice! I haven't had a go at flux yet but I'll keep that in my back pocket for the next time I need to spell a word using vegetables.

hnbad · on Sept 26, 2024

It got the letters right but none of the vegetables it used exist on this planet.

kgeist · on Sept 26, 2024

Wikipedia has a dedicated article: https://en.wikipedia.org/wiki/Unusually_shaped_fruits_and_ve...

hnbad · on Sept 27, 2024

Yeah, and notice how the emphasis on the word "unusual".

GaggiX · on Sept 26, 2024

Ideogram v2 first take on the prompt: https://ideogram.ai/assets/image/lossless/response/v_LgyXI1Q...

https://ideogram.ai/assets/image/lossless/response/V4RRDJZJS...

The carrots in the first image are kinda funny.

EDIT: Bonus Hacker News: https://ideogram.ai/assets/image/lossless/response/SW0B7y4jR...

gpmcadam · on Sept 26, 2024

Grok did OK? (Grok 2 Mini Beta)

https://i.imgur.com/mvnusFd.jpeg

card_zero · on Sept 26, 2024

What kind of vegetable is the first 'l' made out of? It's like tiny chicken nuggets.

RGamma · on Sept 26, 2024

Expired cauliflower.

sfn42 · on Sept 26, 2024

Looks a bit like cloudberries but not really. More like fried chicken

rmorey · on Sept 26, 2024

Grok just uses Flux, fyi

dools · on Sept 26, 2024

Not bad! I just tried with 4o and ChatGPT is still failing hard.

Have I just inadvertently invented a new benchmark!?

randomdata · on Sept 26, 2024

No. Had you asked for it to spell "strawberry", though...

thaumasiotes · on Sept 26, 2024

> Not bad!

Well, it's not a great example of spelling "HELLO"...

dools · on Sept 26, 2024

Hey we would have been happy with a case insensitive but otherwise correct answer ...

sd9 · on Sept 26, 2024

I just tried it with ChatGPT 4o and it seemed to do a good enough job with the first prompt I tried (which I copied from your comment).

https://chatgpt.com/share/66f530b0-3fb8-800a-8af9-8a3e48a31a...

dools · on Sept 26, 2024

I can’t see the pick in the shared link but when I tried with 4o it gave me HIIILO

EDIT: maybe it’s influenced by my custom instructions and memories. I write code all day with it and I have custom instructions specifically to get the type of output I like for code, mostly focused on brevity.

fernandotakai · on Sept 26, 2024

funny enough, grok was able to generate one https://i.imgur.com/qt89arC.png

card_zero · on Sept 26, 2024

I think they're trying to spell "HELP".

int_19h · on Sept 26, 2024

At one point, asking ChatGPT to output SVG spelling out "HELLO" would pretty consistently produce something like "LOL".