Hacker Newsnew | past | comments | ask | show | jobs | submit | codegladiator's commentslogin

> With Opus 4.5, Claude Code feels like having a god-level engineer beside you. Opinionated but friendly. Zero ego.

Who keeps forgetting variable names and function calling conventions it used 4 seconds ago while using 136 GBs of ram for the cli causing you to frequently force quit the whole terminal. Its not even human level.


And then hallucinating APIs that don't exist, breaking all the unit tests and giving up saying they're an "implementation detail", and over engineering a horrific class that makes Enterprise Fizzbuzz look reasonable

Except a god-level engineer wouldn't write unit tests that pass but don't actually test anything because it mocked the responses instead of testing the _actual_ responses, so your app is still broken despite tests passing and "victory!" claims by the "engineer".

Just one example of many personal experiences.

It is helpful, and very very fast at looking things up, sifting through logs and documentation to figure out a bug, writing ad-hoc scripts, researching solutions; but definitely junior-level when it comes to reasoning, you really have to keep your thinking cap on and guide it.


I've been running claude code on a 13 year old potato and it's never used 136GB of RAM - possibly because I only have 8GB.

Its vram or something makes the OS completely busy even I have only 32 gb ram. task manager shows 100+ gbs forcing to terminate

is that vram on your GPU? I don't think claude code uses that.

Not on GPU, I think it's just paged memory. You are right claude-code isn't running the model locally. Today I've had to kill it 5 times till now.

edit: https://ibb.co/Fbn8Q3pb

that's the 6th


Why do you think it's Claude and not iTerm?

been using iterm for 10 years. Didn't update recently. claude code is the only new factor in my setup. I can visibly predict as i am using claude code when its about to happen (when conversation goes above 200 messages and then uses sub agents leading to somehow infinite rerendering of the message timeline and they seemingly use a html to bash rendering thing because ... ) so yeah maybe you are right iterm is not able to handle those rerendering or maybe the monitor is broken.

I use xterm, and the visual glitch doesn't crash anything, so maybe try that? I suspect though maybe you're using much longer sessions than I do, with the talk of sub agents and all.

I've mostly just been using it for single features and then often just quitting it until I have the next dumb idea to try out.


Context is garbage in, garbage out.

My entire codebase is in a certain style that's very easy to infer from just looking around in the same file, yet Claude Code routinely makes up its own preferences and doesn't respect the style even given an instruction in CLAUDE.md. Claude Code brings its own garbage even when there's plenty of my own garbage to glean from. That's not what GIGO is supposed to be.

There is a reasonable argument that your question is at least NP, and plausibly NP-hard or harder depending on how you formalize the verification oracle.

their cli agent only takes 136 gbs of ram and is now giving head to head competition to chrome browser


happens all the time



Ha. I even gave it a serious response!


Wow so weird! Is this like some weird roundabout indiehacker marketing trick?


Yeah, super shady. Perhaps they're collecting data points for a product or else generating content for a blog post about how lonely building solo is.


Logging is a hack now ?


There is always a burden of deployment of assets


Each passion project should be built in the language most suitable for that project, not based on love.


They haven't listed SES there yet in the affected services on their status page


The problem is that this "comparison" is being used both ways, on one hand LLM leaders tell you "smarter than the smartest", and then it makes very pretty obvious mistakes and the leaders are like even an "average" (dumb) humans can/will make the same mistake.


Why not both?

LLMs have jagged capabilities, as AIs tend to do. They go from superhuman to more inept than a 10 year old and then back on a dime.

Really, for an AI system, the LLMs we have are surprisingly well rounded. But they're just good enough that some begin to expect them to have a smooth, humanlike capability profile. Which is a mistake.

Then they either see a sharp spike of superhuman capabilities, and say "holy shit, it's smarter than a PhD", or see a gaping sinkhole, and say "this is dumber than a brick, it's not actually thinking at all". Both are wrong but not entirely wrong. They make the right observations and draw the wrong conclusions.


It cannot be both. A system with superhuman capabilities cannot make basic mistakes consistently. (like forgetting a name as it moves from generating 1st line to 3rd line).

LLMs are a great tool, but the narrative around them is not healthy and will burn a lot of real users.


> A system with superhuman capabilities cannot make basic mistakes consistently

That sounds like a definition you just made up to fit your story. A system can both make bigger leaps in a field where the smartest human is unfamiliar and make dumber mistakes than a 10 year old. I can say that confidently, because we have such systems. We call them LLMs.

It's like claiming that it can't both be sunny and rainy. Nevertheless, it happens.


Yeah I don't know what your definition of human is, but in my definition of when comparing something to an average human, knowing a name is an innate quality. If a human is consistently forgetting names I will think something is wrong with that human that they are unable to remember names.


I think you should work with a bunch of highly respected PhD researchers. This is a quality many share - the classic “can solve super hard problems but can’t tie their shoes” is a trope because versions of it ring true. This is not to say what LLMs are doing is thinking per se, but what we do isn’t magic either. We just haven’t explained all the mechanisms of human thought yet. How much overlap between the two is up for debate considering how little actual thinking people do day to day; most folks almost always are just reacting to a stimuli.


Would you then dismiss this same human who forgets names as dumb and unthinking if he can handle quantum physics effortlessly?


If I had to fight Deep Blue and win? I'd pick a writing contest over a game of chess.

For AIs, having incredibly narrow capabilities is the norm rather than an exception. That doesn't make those narrow superhuman AIs any less superhuman. I could spend a lifetime doing nothing but learning chess and Deep Blue would still kick my shit in on the chessboard.


I think the capability of something or somebody, in a given domain, is mostly defined by their floor, not their ceiling. This is probably true in general but with LLMs it's extremely true due to their self recursion. Once they get one thing wrong, they tend to start basing other things on that falsehood to the point that I often find that when they get something wrong, you're far better off just starting with a new context instead of trying to correct them.

With humans we don't really have to care about this because our floor and our ceiling tend to be extremely close, but obviously that's not the case for LLMs. This is made especially annoying with ChatGPT which seems to be being intentionally designed to convince you that you're the most brilliant person to have ever lived, even when what you're saying/doing is fundamentally flawed.


Consistency drive. All LLMs have a desire for consistency, right at the very foundation at their behavior. The best tokens to predict are the ones that are consistent with the previous tokens, always.

Makes for a very good base for predicting text. Makes them learn and apply useful patterns. Makes them sharp few-shot learners. Not always good for auto-regressive reasoning though, or multi-turn instruction following, or a number of other things we want LLMs to do.

So you have to un-teach them maladaptive consistency-driven behaviors - things like defensiveness or error amplification or loops. Bring out consistency-suppressed latent capabilities - like error checking and self-correction. Stitch it all together with more RLVR. Not a complex recipe, just hard to pull off right.


LLMs have no desire for anything. They're algorithms and this anthropomorphicization is nonsense.

And no, the best tokens to predict are not "consistent", based on what the algorithm would perceive, with the previous tokens. The goal is for them to be able to generate novel information self-expand their 'understanding'. All you're describing is a glorified search/remix engine, which indeed is precisely what LLMs are, but not what the hype is selling them as.

In other words, the concept of the hype is that you train them on the data just before relativity and they should be able to derive relativity. But of course that is in no way whatsoever consistent with the past tokens because it's an entirely novel concept. You can't simply carry out token prediction, but actually have have some degree of logic, understanding, and so on - things which are entirely absent, probably irreconcilably so, from LLMs.


Not anthropomorphizing LLMs is complete and utter nonsense. They're full of complex behaviors, and most of them are copied off human behavior.

It seems to me like this is just some kind of weird coping mechanism. "The LLM is not actually intelligent" because the alternative is fucking terrifying.


No they are not copied off of human behavior in any way shape or fashion. They are simply mathematical token predictors based on relatively primitive correlations across a large set of inputs. Their success is exclusively because it turns out, by fortunate coincidence, that our languages are absurdly redundant.

Change their training content to e.g. stock prices over time and you have a market prediction algorithm. That the next token being predicted is a word doesn't suddenly make them some sort of human-like or intelligent entity.


"No they are not copied off of human behavior in any way shape or fashion."

The pre-training phase produces the next-token predictors. The post-training phase is where its shown examples of selected human behavior for it to imitate - examples of conversation patterns, expert code production, how to argue a point... there's an enormous amount of "copying human behavior" involved in producing a useful LLM.


Why single out SFT?

It's not like the pre-training dataset didn't contain any examples of human behaviors for an LLM to copy.

SFT is just a more selective process. And a lot of how it does what it does is less "teach this LLM new tricks" and more "teach this LLM how to reach into its bag of tricks and produce the right tricks at the right times".


I think it is a more clear example of deliberately teaching a model specific ways to behave based on human examples.


I think what he's saying (and what I would at least) is that again all you're doing is the exact same thing - tuning the weights that drive the correlations. For an analog, in a video game if you code a dragon such that its elevation changes over time while you play a wing flapping animation, you're obviously not teaching it dragon-like behaviors, but rather simply trying to create a mimicry of the appearance of flying using relatively simple mathematical tools and 'tricks.' And indeed even basic neural network game bots benefit from RLHF/SFT.


You are a mathematical predictor based on relatively primitive correlations across a large set of inputs.

The gap between you and an LLM is hilariously small.


No you're not. Humans started with literally nothing, not even language. We went from an era with no language and with the greatest understanding of technology being 'poke them with the pointy side' to putting a man on the Moon, unlocking the secrets of the atom, and much more. And given how inefficiently we store and transfer knowledge, we did it in what was essentially the blink of an eye.

Give an LLM the entire breadth of human knowledge at the time and it would do nothing except remix what we knew at that point in history, forever. You could give it infinite processing power, and it's still not moving beyond 'poke them with the pointy side.'


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: