Will incorporate false-positive rates into the rubric from the next run onwards.
At winfunc, we spent a lot of research time taming these models to eradicate false-positive rates (it's high!) so this does feel important enough to be documented. Thanks!
That is dependent upon the quality of the AI. The argument is not about the quality of the components but the method used.
It's trivial to say using an inadequate tool will have an inadequate result.
It's only an interesting claim to make if you are saying that there is no obtainable quality of the tool that can produce an adequate result (In this argument, the adequate result in question is a developer with an understanding of what they produce)
I especially love the quadratic fit, chosen with no justification, that brings the US within the uncertainty envelope in the second plot. Also notice how much work the Mexico and USA data points are doing to the previous linear model fit. Oh, my high leverage data point can't be an outlier because it's within uncertainty when I fit the data with the potential outlier included. This is basic linear model validation stuff.
> Has a hard to explain fixation on doing things a certain way, e.g. always wants to use panics on errors (panic!, unreachable!, .expect etc) or wants to do type erasure with Box<dyn Any> as if that was the most idiomatic and desirable way of doing things
Yes! I see this constantly. I have a Rust guide that Claude adheres to maybe 50% of the time. It also loves to allocate despite my guide having a whole section about different ways to avoid allocations.
Two use-cases recently where Claude sucked for me:
1. Performance-critical code to featurize byte slices for use in a ML model. Claude kept trying to take multiple passes over the slice when the featurization can obviously be done in one. After I finally got it to do the featurization in one pass it was double-counting some bytes but not others (double counting all of them would have been fine since the feature vector gets normalized). Overall it was just very frustrating because this should have been straight-forward and instead it was dogshit.
2. Performance-critical code that iterates over lines of text and possibly applies transformations, similar to sed. Claude kept trying to allocate new Strings inside of the hot-loop for lines that were not transformed. When I told it to use Cow<'a, str> instead so that the untransformed lines, which make up the majority of processed lines, would not need a new allocation, Claude completely fucked up the named lifetimes. Importantly, my CLAUDE.md already tells Claude to use copy-on-write types to reduce allocations whenever possible. The agent just ignored it, which is _the_ issue with LLMs: they're non-deterministic and any guidance you provide is ultimately just a suggestion.
It's not just assuming that everyone learns the same way. It's assuming that everyone learns the way that all of the research literature on learning claims does not work.
Learning requires active recall/synthesis. Looking at solved examples instead of working them yourself does not suffice in math, physics, chemistry, or CS, but somehow it is supposed to work in this situation?
Machine learning people use "tensor" to just mean an N-dimensional array of numbers. The term is divorced from its meaning in Physics and Mathematics, which caused me some confusion when I started looked at machine learning papers coming from physics.
But the energy transported to Earth from your space power plant still creates waste heat when it is used to do work (and also when it is transported to earth). You cannot beat the second law.
reply