More

mbbutler · 2026-04-30T12:08:42 1777550922

No, you don't understand. LLMs will never be capable of knowing what questions to ask, only how to ask the questions. /s

mbbutler · 2026-04-13T22:30:14 1776119414

It would be helpful to add in some cases that do not contain any vulnerabilities to assess false-positive rate as well.

mufeedvh · 2026-04-13T22:37:41 1776119861

This is a good idea.

Will incorporate false-positive rates into the rubric from the next run onwards.

At winfunc, we spent a lot of research time taming these models to eradicate false-positive rates (it's high!) so this does feel important enough to be documented. Thanks!

cortesoft · 2026-04-13T22:45:11 1776120311

Any code that is certain that it doesn't have any vulnerabilities is going to be pretty trivial to verify.

mbbutler · 2026-04-05T12:47:02 1775393222

In my personal experience, the rate at which Claude Code produces suboptimal Rust is way higher than 1%.

Lerc · 2026-04-05T13:33:28 1775396008

That is dependent upon the quality of the AI. The argument is not about the quality of the components but the method used.

It's trivial to say using an inadequate tool will have an inadequate result.

It's only an interesting claim to make if you are saying that there is no obtainable quality of the tool that can produce an adequate result (In this argument, the adequate result in question is a developer with an understanding of what they produce)

mbbutler · 2026-03-18T12:42:32 1773837752

But Eon's tagline is "Solving brain emulation as an engineering sprint, not a decades-long research program"! How could they have ever gone wrong?

mbbutler · 2026-03-17T17:29:38 1773768578

I especially love the quadratic fit, chosen with no justification, that brings the US within the uncertainty envelope in the second plot. Also notice how much work the Mexico and USA data points are doing to the previous linear model fit. Oh, my high leverage data point can't be an outlier because it's within uncertainty when I fit the data with the potential outlier included. This is basic linear model validation stuff.

mbbutler · 2026-03-16T14:11:07 1773670267

> Has a hard to explain fixation on doing things a certain way, e.g. always wants to use panics on errors (panic!, unreachable!, .expect etc) or wants to do type erasure with Box<dyn Any> as if that was the most idiomatic and desirable way of doing things

Yes! I see this constantly. I have a Rust guide that Claude adheres to maybe 50% of the time. It also loves to allocate despite my guide having a whole section about different ways to avoid allocations.

mbbutler · 2026-03-06T16:07:12 1772813232

Two use-cases recently where Claude sucked for me:

1. Performance-critical code to featurize byte slices for use in a ML model. Claude kept trying to take multiple passes over the slice when the featurization can obviously be done in one. After I finally got it to do the featurization in one pass it was double-counting some bytes but not others (double counting all of them would have been fine since the feature vector gets normalized). Overall it was just very frustrating because this should have been straight-forward and instead it was dogshit.

2. Performance-critical code that iterates over lines of text and possibly applies transformations, similar to sed. Claude kept trying to allocate new Strings inside of the hot-loop for lines that were not transformed. When I told it to use Cow<'a, str> instead so that the untransformed lines, which make up the majority of processed lines, would not need a new allocation, Claude completely fucked up the named lifetimes. Importantly, my CLAUDE.md already tells Claude to use copy-on-write types to reduce allocations whenever possible. The agent just ignored it, which is _the_ issue with LLMs: they're non-deterministic and any guidance you provide is ultimately just a suggestion.

mbbutler · 2025-12-21T22:20:21 1766355621

It's not just assuming that everyone learns the same way. It's assuming that everyone learns the way that all of the research literature on learning claims does not work.

Learning requires active recall/synthesis. Looking at solved examples instead of working them yourself does not suffice in math, physics, chemistry, or CS, but somehow it is supposed to work in this situation?

mbbutler · on Aug 25, 2023

Machine learning people use "tensor" to just mean an N-dimensional array of numbers. The term is divorced from its meaning in Physics and Mathematics, which caused me some confusion when I started looked at machine learning papers coming from physics.

jdthedisciple · on Aug 26, 2023

Again, doesn't seem so divorced: "N-dimensional array of numbers" pretty much works for Mathematics from my understanding.

mbbutler · on July 23, 2022

But the energy transported to Earth from your space power plant still creates waste heat when it is used to do work (and also when it is transported to earth). You cannot beat the second law.