Hacker Newsnew | past | comments | ask | show | jobs | submit | XCSme's commentslogin

I loved implementing the Rabin-Karp algoritm, such a fun and celever solution.

I tried it on their own website:

We couldn't scan this site isitagentready.com returned 522 <none>

The site appears to be experiencing server errors. This is not an agent-readiness issue. Try scanning again later.


I was initially excited by 4.7, as it does a lot better in my tests, but their reasoning/pricing is really weird and unpredictable.

Apart from that, in real-life usage, gpt-5.3-codex is ~10x cheaper in my case, simply because of the cached input discount (otherwise it would still be around 3-4x cheaper anyway).


The reasoning modes are really weird with 4.7

In my tests, asking for "none" reasoning resulted in higher costs than asking for "medium" reasoning...

Also, "medium" reasoning only had 1/10 of the reasoning tokens 4.6 used to have.


Medium reasoning has regressed since 4.6. While None and Max have improved since 4.6 in our benchmark. We suspect that this is how Claude tries to cope with the increased user base. Note, Google and OpenAI probably did something similar long ago.

Oh, and also, the "none" and "medium" variants performed the same (??)

Insane! Even Haiku doesn't make such mistakes.

I am not sure it's a mistake, this might be their new "adaptive reasoning" + hidden reasoning trace, so we can't verify.

Claude is known for its shitty metering.

> Instruction following. Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.

Yay! They finally fixed instruction following, so people can stop bashing my benchmarks[0] for being broken, because Opus 4.6 did poorly on them and called my tests broken...

[0]: https://aibenchy.com/compare/anthropic-claude-opus-4-7-mediu...


Thank you!

One of the worst is TikTok, even as a developer, when someone sends me a TikTok link and I have to visit it, I get stuck in the browser (same with the app but I uninstalled it), and it feels almost device-breaking the way they trap you in.


TikTok is actually very adamant to boot me out of the browser

Initially I thought this was about their B2 file versions/backups, where they keep older versions of your files.

B2 is not a backup service. It’s an object storage service.

Weird, because in the Reddit thread linked above they call themselves a backup service.

I guess you were as confused as me, as I only asociate BackBlaze with B2, I haven't used any other of their services.

It just describes what's in the photo and then some completely wrong/random facts about self-esteem, income, religion, etc.

I guess writing code is now like creating punch-cards for old computers. Or even more recently, as writing ASM instead of using a higher level language like C. Now we simply write our "code" in a higher language, natural language, and the LLM is the compiler.

> Now we simply write our "code" in a higher language, natural language, and the LLM is the compiler.

No we don't and we never should actually, compilers need to be deterministic.


It needs to be something stronger than just deterministic.

With the right settings, a LLM is deterministic. But even then, small variations in input can cause very unforeseen changes in output, sometimes drastic, sometimes minor. Knowing that I'm likely misusing the vocabulary, I would go with saying that this counts as the output being chaotic so we need compilers to be non-chaotic (and deterministic, I think you might be able to have something that is non-deterministic and non-chaotic). I'm not sure that a non-chaotic LLM could ever exist.

(Thinking on it a bit more, there are some esoteric languages that might be chaotic, so this might be more difficult to pin down than I thought.)


Why?

Also, give the same programming task to 2 devs and you end up with 2 different solutions. Heck, have the same dev do the same thing twice and you will have 2 different ones.

Determinism seems like this big gotcha, but in it self, is it really?


> Heck, have the same dev do the same thing twice and you will have 2 different ones

"Do the same thing" I need to be pedantic here because if they do the same thing, the exact same solution will be produced.

The compiler needs to guarantee that across multiple systems. How would QA know they're testing the version that is staged to be pushed to prod if you can't guarantee it's the same ?


This is not what a compiler is in any sense.

I cringe every time I read this "punch card" narrative. We are not at this stage at all. You are comparing deterministic stuff and LLMs which are not deterministic and may or may not give you what you want. In fact I personally barely use autonomous Agents in my brownfield codebase because they generate so much unmaintainable slop.

Except that compiler is a non-deterministic pull of a slot-machine handle. No thanks, I'll keep my programming skills; COBOL programmers command a huge salary in 2026, soon all competent programmers will.

Releasing version 9.0 of my self-hosted analytics app[0]. I will finally add an in-app cron job editor, so you can easily schedule clean-up jobs, data retention settings, newsletters/summaries, etc.

[0]: https://www.uxwizz.com


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: