Hacker Newsnew | past | comments | ask | show | jobs | submit | meeton's commentslogin

https://i.imgur.com/xsFKqsI.png

"Draw a picture of a full glass of wine, ie a wine glass which is full to the brim with red wine and almost at the point of spilling over... Zoom out to show the full wine glass, and add a caption to the top which says "HELL YEAH". Keep the wine level of the glass exactly the same."


Maybe the "HELL YEAH" added a "party implication" which shifted it's "thinking" into just correct enough latent space that it was able to actually hunt down some image somewhere in its training data of a truly full glass of wine.

I almost wonder if prompting it "similar to a full glass of beer" would get it shifted just enough.


Can't replicate. Maybe the rollout is staggered? Using Plus from Europe, it's consistently giving me a half full glass.


I am using Plus from Australia, and while I am not getting a full glass, nor am I getting a half full glass. The glass I'm getting is half empty.


Surprised it isn't fully empty for being upside down!


That's funny. HN hates funny. Enjoy your shadowban.


Yeah. I understand that this site doesn’t want to become Reddit, but it really has an allergy to comedy, it’s sad. God forbid you use sarcasm, half the people here won’t understand it and the other half will say it’s not appropriate for healthy discussion…


Good example in this very discussion: https://news.ycombinator.com/item?id=43477003


I like this site, but it can become inhuman sometimes.

People get upvoted for pedantry rather than furthering a conversation, e.g.


Is it drawing the image from top to bottom very slowly over the course of at least 30 seconds? If not, then you're using DALL-E, not 4o image generation.


This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.


Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.


apparently it's not diffusion, but tokens


Works for me as well https://chatgpt.com/share/67e3f838-63fc-8000-ab94-5d10626397...

USA, but VPN set to exit in Canada at time of request (I think).


The EU got the drunken version. And a good drunk know not to top of a glass of wine ever. In that context the glass is already "full".

But aside from that it would only be comparable if would compare your prompts.


Maybe it's half empty.


ha


You might still be on DALL-E. My account is if you use ChatGPT.

I switched over to the sora.com domain and now I have access to it.


the free site even has it, just dont turn on image generation it works with it off, if you enable it it uses dall-e


Most interesting thing to me is the spelling is correct.

I'm not a heavy user of AI or image generation in general, so is this also part of the new release or has this been fixed silently since last I tried?


It very much looks like a side effect of this new architecture. In my experience, text looks much better in recent DALL-E images (so what ChatGPT was using before), but it is still noticeably mangled when printing more than a few letters. This model update seems to improve text rendering by a lot, at least as long as the content is clearly specified.

However, when giving a prompt that requires the model to come up with the text itself, it still seems to struggle a bit, as can be seen in this hilarious example from the post: https://images.ctfassets.net/kftzwdyauwt9/21nVyfD2KFeriJXUNL...


The periodic table is absolutely hilarious, I didn't know LLMs had finally mastered absurdist humor.


Yeah who wouldn't love a dip in the sulphur pool. But back to the question, why can't such a model recognize letters as such? It cannot be trained to pay special attention to characters? How come it can print an anatomically correct eye but not differentiate between P and Z?


I think the model has not decided if it should print a P or a Z, so you end up with something halfway between the two.

It's a side effect of the entire model being differentiable - there is always some halfway point.


The head of foam on that glass of wine is perfect!


I think we're really fscked, because even AI image detectors think the images are genuine. They look great in Photoshop forensics too. I hope the arms race between generators and detectors doesn't stop here.


We're not. This PNG image of a wine glass has JPEG compression artefacts which are leaking from JPEG training data. You can zoom into the image and you will see 8x8 boundaries of the blocks used in JPEG compression, which just cannot be in a PNG. This is a common method to detect AI-generated image and it is working so far, no need for complex photoshop forensics or AI-detectors, just zoom-in and check for compression - current AI is incapable of getting it right – all the compression algorithms are mixed and mashed in the training data, so on the generated image you can find artefacts from almost all of them if you're lucky, but JPEG is prevalent obviously, lossless images are rare online.


If JPEG compression is the only evident flaw, this kind of reinforces my point, as most of these images will end up shared as processed JPEG/WebP on social media.


You didn't get it. The image contains ALL compression artifacts from different algorithms mashed up in a single picture, the JPEG is just prevalent.


Oh, I see. There's still room for reliable detection then.


plenty of real PNG images have jpeg artifacts because they were once jpegs off someones phone...


So I'm one of these seemingly non-verbal thinkers, including when I code.

I think it makes me more capable of _making use of_ complex concepts. I came into programming through mathematics, and I treat them both as aesthetic exercises. When I'm building a system in my head the solution usually appears visually, and ideas overlay themselves over the problem as aesthetic "feels". Yes it's a lot like being a visual designer: I can step back, view the solution, and just 'see' if it looks right.

Why should we structure our solution like this? I can't easily put it into words but it just... would be more natural like this. And then a few days later the reason it was correct becomes liminal and I can explain it properly. It lets me hold more ideas in my head and make use of them all at once. When picking up a new idea I can grasp the underlying concept, see the symmetry with ideas I already understand, and slot it into place.

Of course it has major downsides too. It's an effort for me to put my full ideas into words. Coding, like anything worth doing, is a team sport. If I can't vocalize my ideas then half the time that makes them worthless, especially when the decisions are important and therefore contested. I tend to make mental jumps that lose other people, and lose track of what state other people have.

Also, and this is in line with what you said Aedron, it does make it harder to check the details. I'll make silly mistakes because checking them isn't part of my mental construct. I can chase a half-formed idea for a day before realizing my mental picture was off, and I didn't catch it because I never put the problem into words. Pracical-but-ugly hacks don't occur to me because they aren't aesthetic. I'm worthless at remembering my girlfriend's friends' names.

This year I'm focusing on moving slower, writing more things down, and talking to people more. So far it's been really helpful. But I don't think I'd have got to where I am now, or be able to solve the kinds of problems I do, if I was a mostly verbal thinker.


Podcast pro tip: listen on 0.8x speed. Everything becomes so dreamy and sleepy sounding.


People are saying this is 'basically the same as autocorrect or predictive text'. It's not. Autocorrect doesn't make any creative decisions for you, whereas this does.

That is to say: we think at the level of words, not letter-by-letter. When I make a typo, autocorrect corrects what my hands do to match what my brain is thinking. My brain still has primacy. This thing sits at the level of words and even sentences: if it's autocorrect, it's working to correct what my brain is thinking. Which is creepy and sad.

It's a little bit more like predictive text I admit. But because predictive text only suggests one word at a time, there's little semantic meaning to a suggestion and it's rare that I have my thoughts distracted or changed because of it. It's still largely a convenience tool. Suggesting a full sentence is shaping the direction of your thought, which is very different.

I'm still horrified that Google has put this out.


This is awful. We need _less_ mediation and commodification of our personal interactions, not more. What is the use of this? At best this is a solution searching for a problem, at worst it's an attempt to standardize our communication in a way which makes semantic meaning easier to analyse.


Agreed. Rather than something customers asked for, this feels like something driven by the culture at Google: "AI all the things" and "build new things" to get promoted.


You should receive the amount of corporate email I do. I don't want my email responses at work to be rich personal interactions. I want them to take me the least amount of time while still being useful.


Agree. While predictive keyboard is really helpful on mobile (the Windows Phone one was amazing) because it's inconvenient to type on a small on-screen keyboard, and some daily-life interactions are repetitive, on the email side it is the reverse that's apply: most (personal) mails are a priori different in content and a physical keyboard does not justify external helping system. Especially that being a Goole product, you can be sure they will reused everything you type to know more about you.


When the majority of written communications are composed from a few selected branches, the potential storage and transmission savings are huge!


... I guarantee that if you draw that out for us, someone will 4-colour it for you. That's what "theorem" means.


How about this: when you get into the booth you're given an ID. You then cast your vote for Ms A and you can see on the block chain that it was recorded for A. After you cast, the system shows you a bunch of ids on the chain which voted for other candidates and you can memorise the id shown that voted for Mr B. When people come round, you tell them that other id. When they look it up they see that it voted for B and leave you alone.

That doesn't solve the potential problem of vote stuffing... Still thinking about that one.


> When people come round, you tell them that other id.

That doesn't work if someone else has already told them the same ID.


However in this setup, anyone can bring a smartphone in and take a picture of the screen when it displays anything that separates your id from a fake one.

Sounds like you might be on the right track if you can get over that hurdle somehow.


The smartphone problem exists currently with paper ballots. You're not supposed to be allowed to bring a camera into the voting booth, but this is not enforced particularly well.

One approach to this problem is to make it easy to cancel a previous ballot and submit a new one, so you can get your evidence that you voted the way e.g. your employer wanted you to, but then you can cancel it and vote with your conscience.


In the Brazilian voting machine this is often done. You type the numbers and it loads the candidate info in the screen. Once you click [Confirm] the screen gets blank with the message of success. Therefore the only way to take a picture is before the vote is actually processed. You have a [Reset] button to re-enter the numbers.


As far as I can remember from the instructions last time I voted, in Britain you can do this, just return the ballot to the person handing them out and say that you made a mistake, and they should issue you a replacement.



Correct me if I'm wrong, but isn't that entirely different on account of not 'suddenly' declaring your money fake and worthless, but rather allowing people to bring their money to a bank for other notes?


Assuming this is being sold to companies other than strictly self-improvement apps, how can you consider this to be an ethical thing to have created?


Hi!

Because we don't sell to apps that would hurt people. See manifesto on usedopamine.com/team/index.html.

~100 years ago, the most frequent causes of death in the US were pathogens for which we barely had a name. Pnemonia, Flu, Cholera, Fevers. And it was only after we developed a rigorous technology of the body (modern medicine) did we lift millions of people out of suffering simultaneously.

Today, if you are under 50, you're mostly like going to die of opiates. Over 50? Type-2 diabetes, stroke, cardiovascular disease, obesity and its complications, and stress-related illness.

Every single one of these has strong behavioral components.

Building a smartphone-first, AI-powered rigorous technology of the human mind gives us all an above-the-table, democratized chance at designing scalable technologies that stop this. It spreads better across national, sex, gender, and SES lines than most other behavior-change oriented solutions. And as we enter an age of an excess of cheap energy, food, and data, we NEED a rigorous way to help us better align modern aspirations with an ancient brainstem.


How can we trust you not to sell apps that hurt people? The road to hell is paved with good intentions.

Note that you're using the word 'addiction'. Many would argue that is hurting in and of itself - regardless of what one would be addicted to.

And that last paragraph is absolutely haunting. We need companies controlling our minds because our brainstem isn't evolved enough??

I hope you get sued into the ground. It's time to start holding people accountable for their effect on our peoples brains.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: