Training an LLM from scratch involves carefully curating the data first. The idea that it just memorizes the whole web is a nice simplified mental model, but glosses over huge amounts of hard work to decide which websites are authoritative and on which subjects. This isn’t fooling anybody except rank amateurs.
The README claims that it costs 50-100 bytes per token when rendered to UTF8 text and wrapped in JSON. Citation needed please? JSON can be inefficient if you have lots of keys or use pretty whitespace. But UTF8 is very efficient. I don’t see it.
People die either way. People die because treatments sit in the lab for years instead of in clinics.
Consider the FLASH oncology treatment. In 1995 Dr Fauvadon figure out how to make radiation therapy much much safer. He spent 14 years before sharing it in 2009. In 2014 it was actually published. [1]
For all the doubt and negativity here I just want to say “good job” to you. Way to take matters into your own hands and protect your love ones. Haters gonna hate but you did it.
You’re holding on to the intuition (hope) that we are smarter than the LLMs in some hard to define way. Maybe. But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win. I agree experienced humans are still better on “judgement” tasks in their field. But the judgement tasks are kinda necessarily ones where there isn’t a correct answer. And even then, I think the machines’ judgement is better than a lot of humans.
Is medical diagnosis one of these high judgement tasks? Personally I don’t think so.
LLM’s operate on a mechanical form of intelligence one that at present is not adaptive to changes in the environment.
If the latter part of your post were true, how come the demand for radiologists has grown? The problem with this place is it’s full of people who don’t understand nuance. And your post demonstrates this emphatically.
For me there are a few main takeaways on how AI _could_ supersede the average ER doctor.
The first is that a technical solution can be trained on _ALL_ medical data and have access to it all in the moment. It is difficult to assume a doctor could also achieve this.
The second is that for medical cases understanding the sum of all symptoms and the patients vitals would lead to an accurate diagnosis a majority of the time. AI/ML is entirely about pattern recognition, when you combine this with point one, you end up with a system that can quickly diagnose a large portion of patients in extremely short timeframes.
On a different note, I think we can leave the ad-hominem attacks at home please.
>But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win.
I and likely the person who you replayed to don't find that existing studies actually hold this to be true.
> But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win.
Quite to the contrary, I think it's extremely trivial to find a task where humans beat LLMs.
For all the money that's been thrown at agentic coding, LLMs still produce substantially worse code than a senior dev. See my own prior comments on this for a concrete example [1].
These trivial failure cases show that there are dimensions to task proficiency - significant ones - that benchmarks fail to capture.
> Is medical diagnosis one of these high judgement tasks?
Situational. I would break diagnosis into three types:
1. The diagnosis comes from objective criteria - laboratory values, vital signs, visual findings, family history. I think LLMs are likely already superior to humans in this case.
2. The diagnosis comes from "chart lore" - reading notes from prior physicians and realizing that there is new context now points to a different diagnosis. (That new context can be the benefit of hindsight into what they already tried and failed and/or new objective data). LLMs do pretty good at this when you point them at datasets where all the prior notes were written by humans, which means that those humans did a nontrivial part of the diagnostic work. What if the prior notes were written by LLMs as well? Will they propagate their own mistakes forward? Yet to be studied in depth.
3. The diagnosis comes from human interaction - knowing the difference between a patient who's high as a bat on crack and one who's delirious from infection; noticing that a patient hesitates slightly before they assure you that they've been taking all their meds as prescribed; etc. I doubt that LLMs will ever beat humans at this, but if LLMs can be proven to be good at point 2, then point 3 alone will not save human physicians.
> I doubt that LLMs will ever beat humans at this, but if LLMs can be proven to be good at point 2, then point 3 alone will not save human physicians.
Agree with your division but I'm baffled by this argument. If humans are better than machines at point 3 and can also use a machine to do point 2, then unless they have particularly terrible biases against taking point 2 data into account they're going to be strictly better than machines alone. Doctors have costs, but they're costs people/society are generally willing to underwrite, and misdiagnosis also has costs...
There are almost no real world tasks that LLMs outperform humans on, operating by themselves. Pair them with a human for adaptability, judgement, and real world context and let the human drive, sure. Just let it loose on its own? You get an ocean of slop that doesn't do even close to what it's supposed to.
Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.
reply