meeton's comments

meeton · on March 25, 2025

https://i.imgur.com/xsFKqsI.png

"Draw a picture of a full glass of wine, ie a wine glass which is full to the brim with red wine and almost at the point of spilling over... Zoom out to show the full wine glass, and add a caption to the top which says "HELL YEAH". Keep the wine level of the glass exactly the same."

cruffle_duffle · on March 25, 2025

Maybe the "HELL YEAH" added a "party implication" which shifted it's "thinking" into just correct enough latent space that it was able to actually hunt down some image somewhere in its training data of a truly full glass of wine.

I almost wonder if prompting it "similar to a full glass of beer" would get it shifted just enough.

Stevvo · on March 25, 2025

Can't replicate. Maybe the rollout is staggered? Using Plus from Europe, it's consistently giving me a half full glass.

amy_petrik · on March 26, 2025

I am using Plus from Australia, and while I am not getting a full glass, nor am I getting a half full glass. The glass I'm getting is half empty.

DonHopkins · on March 26, 2025

Surprised it isn't fully empty for being upside down!

bb88 · on March 26, 2025

That's funny. HN hates funny. Enjoy your shadowban.

BlobberSnobber · on March 26, 2025

Yeah. I understand that this site doesn’t want to become Reddit, but it really has an allergy to comedy, it’s sad. God forbid you use sarcasm, half the people here won’t understand it and the other half will say it’s not appropriate for healthy discussion…

Tempat · on March 26, 2025

Good example in this very discussion: https://news.ycombinator.com/item?id=43477003

bb88 · on March 27, 2025

I like this site, but it can become inhuman sometimes.

People get upvoted for pedantry rather than furthering a conversation, e.g.

coder543 · on March 25, 2025

Is it drawing the image from top to bottom very slowly over the course of at least 30 seconds? If not, then you're using DALL-E, not 4o image generation.

uh_uh · on March 25, 2025

This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.

cubefox · on March 26, 2025

Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.

thesparks · on March 26, 2025

apparently it's not diffusion, but tokens

WaxProlix · on March 26, 2025

Works for me as well https://chatgpt.com/share/67e3f838-63fc-8000-ab94-5d10626397...

USA, but VPN set to exit in Canada at time of request (I think).

raxxorraxor · on March 26, 2025

The EU got the drunken version. And a good drunk know not to top of a glass of wine ever. In that context the glass is already "full".

But aside from that it would only be comparable if would compare your prompts.

sionisrecur · on March 25, 2025

Maybe it's half empty.

EgregiousCube · on March 26, 2025

qingcharles · on March 25, 2025

You might still be on DALL-E. My account is if you use ChatGPT.

I switched over to the sora.com domain and now I have access to it.

cchance · on March 26, 2025

the free site even has it, just dont turn on image generation it works with it off, if you enable it it uses dall-e

eitland · on March 26, 2025

Most interesting thing to me is the spelling is correct.

I'm not a heavy user of AI or image generation in general, so is this also part of the new release or has this been fixed silently since last I tried?

widerporst · on March 26, 2025

It very much looks like a side effect of this new architecture. In my experience, text looks much better in recent DALL-E images (so what ChatGPT was using before), but it is still noticeably mangled when printing more than a few letters. This model update seems to improve text rendering by a lot, at least as long as the content is clearly specified.

However, when giving a prompt that requires the model to come up with the text itself, it still seems to struggle a bit, as can be seen in this hilarious example from the post: https://images.ctfassets.net/kftzwdyauwt9/21nVyfD2KFeriJXUNL...

remuskaos · on March 26, 2025

The periodic table is absolutely hilarious, I didn't know LLMs had finally mastered absurdist humor.

soco · on March 26, 2025

Yeah who wouldn't love a dip in the sulphur pool. But back to the question, why can't such a model recognize letters as such? It cannot be trained to pay special attention to characters? How come it can print an anatomically correct eye but not differentiate between P and Z?

londons_explore · on March 26, 2025

I think the model has not decided if it should print a P or a Z, so you end up with something halfway between the two.

It's a side effect of the entire model being differentiable - there is always some halfway point.

dghlsakjg · on March 26, 2025

The head of foam on that glass of wine is perfect!

ASalazarMX · on March 26, 2025

I think we're really fscked, because even AI image detectors think the images are genuine. They look great in Photoshop forensics too. I hope the arms race between generators and detectors doesn't stop here.

gloosx · on March 26, 2025

We're not. This PNG image of a wine glass has JPEG compression artefacts which are leaking from JPEG training data. You can zoom into the image and you will see 8x8 boundaries of the blocks used in JPEG compression, which just cannot be in a PNG. This is a common method to detect AI-generated image and it is working so far, no need for complex photoshop forensics or AI-detectors, just zoom-in and check for compression - current AI is incapable of getting it right – all the compression algorithms are mixed and mashed in the training data, so on the generated image you can find artefacts from almost all of them if you're lucky, but JPEG is prevalent obviously, lossless images are rare online.

ASalazarMX · on March 28, 2025

If JPEG compression is the only evident flaw, this kind of reinforces my point, as most of these images will end up shared as processed JPEG/WebP on social media.

gloosx · on April 1, 2025

You didn't get it. The image contains ALL compression artifacts from different algorithms mashed up in a single picture, the JPEG is just prevalent.

ASalazarMX · on April 2, 2025

Oh, I see. There's still room for reliable detection then.

londons_explore · on March 26, 2025

plenty of real PNG images have jpeg artifacts because they were once jpegs off someones phone...

meeton · on Jan 31, 2020

So I'm one of these seemingly non-verbal thinkers, including when I code.

I think it makes me more capable of _making use of_ complex concepts. I came into programming through mathematics, and I treat them both as aesthetic exercises. When I'm building a system in my head the solution usually appears visually, and ideas overlay themselves over the problem as aesthetic "feels". Yes it's a lot like being a visual designer: I can step back, view the solution, and just 'see' if it looks right.

Why should we structure our solution like this? I can't easily put it into words but it just... would be more natural like this. And then a few days later the reason it was correct becomes liminal and I can explain it properly. It lets me hold more ideas in my head and make use of them all at once. When picking up a new idea I can grasp the underlying concept, see the symmetry with ideas I already understand, and slot it into place.

Of course it has major downsides too. It's an effort for me to put my full ideas into words. Coding, like anything worth doing, is a team sport. If I can't vocalize my ideas then half the time that makes them worthless, especially when the decisions are important and therefore contested. I tend to make mental jumps that lose other people, and lose track of what state other people have.

Also, and this is in line with what you said Aedron, it does make it harder to check the details. I'll make silly mistakes because checking them isn't part of my mental construct. I can chase a half-formed idea for a day before realizing my mental picture was off, and I didn't catch it because I never put the problem into words. Pracical-but-ugly hacks don't occur to me because they aren't aesthetic. I'm worthless at remembering my girlfriend's friends' names.

This year I'm focusing on moving slower, writing more things down, and talking to people more. So far it's been really helpful. But I don't think I'd have got to where I am now, or be able to solve the kinds of problems I do, if I was a mostly verbal thinker.

meeton · on June 24, 2018

Podcast pro tip: listen on 0.8x speed. Everything becomes so dreamy and sleepy sounding.

meeton · on May 8, 2018

People are saying this is 'basically the same as autocorrect or predictive text'. It's not. Autocorrect doesn't make any creative decisions for you, whereas this does.

That is to say: we think at the level of words, not letter-by-letter. When I make a typo, autocorrect corrects what my hands do to match what my brain is thinking. My brain still has primacy. This thing sits at the level of words and even sentences: if it's autocorrect, it's working to correct what my brain is thinking. Which is creepy and sad.

It's a little bit more like predictive text I admit. But because predictive text only suggests one word at a time, there's little semantic meaning to a suggestion and it's rare that I have my thoughts distracted or changed because of it. It's still largely a convenience tool. Suggesting a full sentence is shaping the direction of your thought, which is very different.

I'm still horrified that Google has put this out.

meeton · on May 8, 2018

This is awful. We need _less_ mediation and commodification of our personal interactions, not more. What is the use of this? At best this is a solution searching for a problem, at worst it's an attempt to standardize our communication in a way which makes semantic meaning easier to analyse.

hangonhn · on May 8, 2018

Agreed. Rather than something customers asked for, this feels like something driven by the culture at Google: "AI all the things" and "build new things" to get promoted.

euyyn · on May 9, 2018

You should receive the amount of corporate email I do. I don't want my email responses at work to be rich personal interactions. I want them to take me the least amount of time while still being useful.

titanix2 · on May 8, 2018

Agree. While predictive keyboard is really helpful on mobile (the Windows Phone one was amazing) because it's inconvenient to type on a small on-screen keyboard, and some daily-life interactions are repetitive, on the email side it is the reverse that's apply: most (personal) mails are a priori different in content and a physical keyboard does not justify external helping system. Especially that being a Goole product, you can be sure they will reused everything you type to know more about you.

z0r · on May 8, 2018

When the majority of written communications are composed from a few selected branches, the potential storage and transmission savings are huge!

meeton · on April 17, 2018

... I guarantee that if you draw that out for us, someone will 4-colour it for you. That's what "theorem" means.

meeton · on March 16, 2018

How about this: when you get into the booth you're given an ID. You then cast your vote for Ms A and you can see on the block chain that it was recorded for A. After you cast, the system shows you a bunch of ids on the chain which voted for other candidates and you can memorise the id shown that voted for Mr B. When people come round, you tell them that other id. When they look it up they see that it voted for B and leave you alone.

That doesn't solve the potential problem of vote stuffing... Still thinking about that one.

brianberns · on March 16, 2018

> When people come round, you tell them that other id.

That doesn't work if someone else has already told them the same ID.

snakeboy · on March 16, 2018

However in this setup, anyone can bring a smartphone in and take a picture of the screen when it displays anything that separates your id from a fake one.

Sounds like you might be on the right track if you can get over that hurdle somehow.

kybernetikos · on March 16, 2018

The smartphone problem exists currently with paper ballots. You're not supposed to be allowed to bring a camera into the voting booth, but this is not enforced particularly well.

One approach to this problem is to make it easy to cancel a previous ballot and submit a new one, so you can get your evidence that you voted the way e.g. your employer wanted you to, but then you can cancel it and vote with your conscience.

feborges · on March 16, 2018

In the Brazilian voting machine this is often done. You type the numbers and it loads the candidate info in the screen. Once you click [Confirm] the screen gets blank with the message of success. Therefore the only way to take a picture is before the vote is actually processed. You have a [Reset] button to re-enter the numbers.

rjmunro · on March 16, 2018

As far as I can remember from the instructions last time I voted, in Britain you can do this, just return the ballot to the person handing them out and say that you made a mistake, and they should issue you a replacement.

meeton · on Nov 8, 2017

India did that!

https://en.m.wikipedia.org/wiki/2016_Indian_banknote_demonet...

mercer · on Nov 8, 2017

Correct me if I'm wrong, but isn't that entirely different on account of not 'suddenly' declaring your money fake and worthless, but rather allowing people to bring their money to a bank for other notes?

meeton · on Nov 8, 2017

Assuming this is being sold to companies other than strictly self-improvement apps, how can you consider this to be an ethical thing to have created?

RAB1138 · on Nov 8, 2017

Hi!

Because we don't sell to apps that would hurt people. See manifesto on usedopamine.com/team/index.html.

~100 years ago, the most frequent causes of death in the US were pathogens for which we barely had a name. Pnemonia, Flu, Cholera, Fevers. And it was only after we developed a rigorous technology of the body (modern medicine) did we lift millions of people out of suffering simultaneously.

Today, if you are under 50, you're mostly like going to die of opiates. Over 50? Type-2 diabetes, stroke, cardiovascular disease, obesity and its complications, and stress-related illness.

Every single one of these has strong behavioral components.

Building a smartphone-first, AI-powered rigorous technology of the human mind gives us all an above-the-table, democratized chance at designing scalable technologies that stop this. It spreads better across national, sex, gender, and SES lines than most other behavior-change oriented solutions. And as we enter an age of an excess of cheap energy, food, and data, we NEED a rigorous way to help us better align modern aspirations with an ancient brainstem.

Grangar · on Nov 8, 2017

How can we trust you not to sell apps that hurt people? The road to hell is paved with good intentions.

Note that you're using the word 'addiction'. Many would argue that is hurting in and of itself - regardless of what one would be addicted to.

And that last paragraph is absolutely haunting. We need companies controlling our minds because our brainstem isn't evolved enough??

I hope you get sued into the ground. It's time to start holding people accountable for their effect on our peoples brains.