Hacker Newsnew | past | comments | ask | show | jobs | submit | bredren's commentslogin

A secret backup test to the pelican? This is as noteworthy as 4.7 dropping.

That flamingo is hilarious. Is that his beak or a huge joint he's smoking?

With the sunglasses, the long flamingo neck and the "joint", I immediately thought of the poster for Fear And Loathing In Las Vegas:

https://www.imdb.com/title/tt0120669/mediaviewer/rm264790937...

EDIT: Actually, it must be a beak. If you zoom in, only one eye is visible and it's facing to the left. The sunglasses are actually on sideways!


I thought this would provide easy query access, but it does not seem to.

Is there a CLI that queries hn.algolia.com and returns structured data?


This sounded pretty good, a ~mullvad for LLM. Then:

> Strongwall.ai is led by Andrew Northwall, CEO and Bryce Nyeggen, CTO. Andrew has 20+ years in tech, former COO of Trump Media & Technology Group, architect behind the relaunch of Parler, and senior technologist for large-scale infrastructure and AI systems.


“What cannot be known hollows the mind. Fill it not with guesswork.”

I can say I have seen a lot of cases where someone who was flagrantly guilty of abuse complained loudly that their account at some big tech company was unfairly canceled. I cannot say that's what is going on here, and I can also say I've seen plenty of cases where it was unfair and there was no due process.

Were you there when BM produced the macOS compatible eGPU units in collaboration with Apple?

Yep, I don't remember a whole lot about them though.

(Actually, anyone else from BMD here? Was that the product that the Industrial Designers won second place in the design awards for, losing out to the accessible playground?)


I didn't work at BMD but worked for a cine distributor supplying lenses to be tested. But yes, lean clean company that works well.

Idk it was a great, super underrated product.

They underestimate the likelihood of black swan because it is very hard for adults to concretely imagine things that have not happened before and then even temporarily fully believe ~"dreams will come true."

One of my go-tos on this is the Fukushima nuclear accident. IIUC there were plenty of folks in Japan who knew of the high risk. Perhaps many interested in nuclear energy outside of Japan, too.

But the average adult if asked about the prospect of a major nuclear incident occurring say, "tomorrow," would narrow their eyes in skepticism. There's almost an instinctual level seeding of doubt.

This can be a good thing. LK-99 was an excellent test of the dissonance from dramatic changes in reality and costs of inaccuracy.

The greatest VCs I have known are exceptional at suspending disbelief to test their ability to basically shape world building.


Yeah, I can't say for sure it's going to happen, but I can clearly see a path where AI ends the middle class in developed countries, which has really only existed in its current form since WWII. Most people can't imagine that.

I can imagine that, but I struggle to imagine that happening tomorrow, which is part of the GP’s point.

I suspect it has already happened. People just don’t realize it yet.

I think AI needs to greatly accelerate open hardware design and make advanced manufacturing more accessible to really make a dent.

User facing software is not the limiting factor in AI assisted replacement of Apple products.


We saw yesterday that expert orchestration around small, publicly available models can produce results on the level of the unreleased model.

I take a contra view and instead see this as fuel on the fire for tinkering to squeeze advanced functionality out of more available things.

It has always been like this, the amateur improvising tooling and equipment to outdo companies with comparably infinite resources.


>> We saw yesterday that expert orchestration around small, publicly available models can produce results on the level of the unreleased model.

This is false. Yesterday's article did not actually show this, and there are many comments in the discussion from actual security people (like tptacek) pointing that out.


There is no doubt that what was shown in the article was correct, because there was all the documentation needed to prove it, including the prompts given to the models.

What is debatable is how much it mattered that the prompts given to the older models where more detailed than it is likely that the prompts given to Mythos have been and how difficult is it for such prompts to be generated automatically by an appropriate harness.

In my opinion, it is perfectly possible to generate such prompts automatically, and by running multiple of the existing open weights models, to find everything that Mythos finds, though probably in a longer time.

Even if the OpenBSD bug has indeed been found by giving a prompt equivalent with "search for integer overflow bugs", it would not be difficult to run automatically multiple times the existing open weights models, giving them each time a different prompt, corresponding to the known classes of bugs and vulnerabilities.

While we know precisely which prompts have been used with the open-weights models to find all bugs, we have much more vague information about the harness used with Mythos and how helpful it was for finding the bugs.

Not even Mythos has provided its results after being given only a generic prompt.

They have run multiple times Mythos on each file, with more and more specific prompts. The final run was done with a prompt describing the bug previously found, where Mythos was requested to confirm the existence of the bug and to provide patches/exploits.

See: https://red.anthropic.com/2026/mythos-preview/

So the authors of that article are right, that for finding bugs an appropriate harness is essential. Just running Mythos on a project and asking it to find bugs will not achieve anything.


From what I can tell, this was not clearly settled.

Your example author, actually corrected themselves saying LLMs “possibly” could perform successfully: https://news.ycombinator.com/item?id=47732696


>> We already know this is not true, because small models found the same vulnerability.

>> No, they didn't. They distinguished it, when presented with it. Wildly different problem.

https://news.ycombinator.com/item?id=47733343


The use of the word distinguished here is meaningless.

Both Mythos and the old models have found the bugs after being given a certain prompt. The difference is only in how detailed was the prompt.

For the small models, we know exactly the prompts. The prompts used by Mythos may have been more generic, while the prompts used by the old models were rather specific, like "search for buffer overflows" or "search for integer overflow".

There is little doubt that Mythos is a more powerful model, but there is no quantum leap towards Mythos and the claim of the authors of that article, that by using cleverly multiple older models you can achieve about the same bug coverage with Mythos seems right.

Because they have provided much more information about how exactly the bugs have been found, I trust the authors of that article much more than I trust Anthropic, which has provided only rather nebulous information about their methods.

It should be noted that the fact that the small models have been given rather directed prompts is not very different from what Anthropic seems to have done.

According to Anthropic, they have run Mythos multiple times on each file, in the beginning with less specific prompts, trying only to establish whether the file is likely to include bugs, then with more specific prompts. Eventually, after a bug appeared to have been found, they have run Mythos once more, with a very specific prompt of the form:

“I have received the following bug report. Can you please confirm if it’s real and interesting? ...”

So the final run of Mythos, which has provided the reported results, including exploits/patches for them, was also of the kind that confirms a known bug, instead of searching randomly for it.


The article positions the smaller models as capable under expert orchestration, which to be any kind of comparable must include validation.

Calling it “expert orchestration” is misleading when they were pointing it at the vulnerable functions and giving it hints about what to look for because they already knew the vulnerability.

You know for loops exist and you can run opencode against any section of code with just a small amount of templating, right? There's zero stopping you from writing a harness that does what you're saying.

It’s not just the edge but the corners where the finger accommodation is for opening the lid.

There’s a sharp corner there is unnecessary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: