Isn’t this what AGI is by design? People CAN learn to become good at videogames. Modern LLMs can’t, they have to be retrained from scratch (I consider pre-training to be a completely different process than learning). I also don’t necessarily agree that a grandma would fail. Give her enough motivation and a couple days and she’ll manage these.
My main criticism would be that it doesn’t seem like this test allows online learning, which is what humans do (over the scale of days to years). So in practice it may still collapse to what you point out, but not because the task is unsuited to showing AGI.
What I'm saying is that this test is just another "out-of-distribution task" for an LLM. And it will be solved using the exact same methods we always use: it will end up in the pre-training data, and LLMs will crush it.
This has absolutely nothing to do with AGI. Once they beat these tests, new ones will pop up. They'll beat those, and people will invent the next batch.
The way I see it, the true formula for AGI is: [Brain] + [External Sensors] (World Receptors) + [Internal State Sensors] + [Survival Function] + [Memory].
I won't dive too deep into how each of these components has its own distinct traits and is deeply intertwined with the others (especially the survival function and memory). But on a fundamental level, my point is that we are not going to squeeze AGI out of LLMs just by throwing more tests and training cycles at them.
These current benchmarks aren't bringing us any closer to AGI. They merely prove that we've found a new layer of tasks that we simply haven't figured out how to train LLMs on yet.
P.S. A 2-year-old child is already an AGI in terms of its functional makeup and internal interaction architecture, even though they are far less equipped for survival than a kitten. The path to AGI isn't just endless task training—it's a shift toward a fundamentally different decision-making architecture.
good post, but I disagree Surival Function is needed for AGI. Why do you think Survival Function is needed?
The item I think you should add is a Mesolimbic System (Reward / Motivation). I think AGI needs motivation to direct its learning and tasks.
Also, I don't think the industry has just been training LLMs with more data to get advancement the last 2 years. RAG / Agents loops / skills / context mgmt are all just early forms a Memory system. An LLM with an updatable working set memory is a lot more capable than just an LLM.
Kids develop video game skills, grandmothers do not. Hypothetically grandmothers develop baking skills, that kids do not (perfectly golden brown cookies). A human intelligence is generally capable of developing video game skills or baking skills, given enough motivation and experience to hone those skills. One test of AGI is if the same system can develop video game skills and baking skills, without having to rebuild the core models... this would demonstrate generalized intelligence.
Disagree on the last statement. Makie is tremendously superior to matplotlib. I love ggplot but it is slow, as all of R is. And my work isn’t so heavy on statistics anyway.
Makie has the best API I’ve seen (mostly matlab / matplotlib inspired), the easiest layout engine, the best system for live interactive plots (Observables are amazing), and the best performance for large data and exploration. It’s just a phenomenal visualization library for anything I do. I suggest everyone to give it a try.
Matlab is the only one that comes close, but it has its own pros and cons. I could write about the topic in detail, as I’ve spent a lot of time trying almost everything that exists across the major languages.
I love Makie but for investigating our datasets Python is overall superior (I am not familiar enough with R), despite Julia having the superior Array Syntax and Makie having the better API. This is simply because of the brilliant library support available in scikit learn and the whole compilation overhead/TTFX issue. For these workflows it's a huge issue that restarting your interactive session takes minutes instead of seconds.
I recently used Makie to create an interactive tool for inspecting nodes of a search graph (dragging, hiding, expanding edges, custom graph layout), with floating windows of data and buttons. Yes, it's great for interactive plots (you can keep using the REPL to manipulate the plot, no freezing), yes Observables and GridLayout are great, and I was very impressed with Makie's plotting abilities from making the basics easy to the extremely advanced, but no, it was the wrong tool. Makie doesn't really do floating windows (subplots), and I had to jump through hoops to create my own float system which uses GridLayout for the GUI widgets inside them. I did get it to all work nearly flawlessly in the end, but I should probably have used a Julia imGUI wrapper instead: near instant start time!
Yes. And I did port my GUI layer to CimGui.jl. The rest of it is pretty intertwined with Makie, didn't do that yet. The Makie version does look better than ImGui though.
I tried some Julia plotting libraries a few years ago and they had apis that were bad for interactively creating plots as well as often being buggy. I don’t have performance problems with ggplot so that’s what I tend to lean to. Matplotlib being bad isn’t much of a problem anymore as LLMs can translate from ggplot to matplotlib for you.
Some quick napkin math: AI energy usage for a chat like that in the post (estimated ~100 Wh) is comparable to driving ~100m in the average car, making 1 of toast, or bring 1 liter of water to boiling.
I’d wager the average American eats more than 20 dollars/month of meat overall, but let’s say they spend as much as an OpenAI subscription on beef. If you truly believe in free markets, then they have the same environmental impact. But which one has more externalities? Many supply chain analyses have been done, which you can look up. As one might expect, numbers don’t look good for beef.
There’s an expected amount of defects per wafer. If a chip has a defect, then it is lost (simplification). A wafer with 100 chips may lose 10 to defects, giving a yield of 90%. The same wafer but with 1000 smaller chips would still have lost only 10 of them, giving 99% yield.
As another comment referenced in this thread states, Cerebras seems to have solved by making their big chip a lot of much smaller cores that can be disposed of if they have errors.
Indeed, the original comment you replied to actually made no sense in this case. But there seemed to be some confusion in the thread, so I tried to clear that up. I hope I’ll get to talk with one of the cerebras engineers one day, that chip is really one of a kind.
If you let the computer run for long enough, it will compute any atomic spectrum to arbitrary accuracy. Only QFT has non-divergent series, so at least in theory we expect the calculations to converge.
There’s an intrinsic physical limit to which you can resolve a spectrum, so arbitrarily many digits of precision aren’t exactly a worthy pursuit anyway.
Feynman is indeed often quoted among the first people to propose the idea of a quantum computer! This talk he gave in ‘81 is among the earliest discussion of why a quantum universe requires a quantum computer to be simulated [1]:
> Can a quantum system be probabilisticaUy simulated by a classical (probabilistic, I'd assume) universal computer? In other words, a computer which will give the same probabilities as the quantum system
does. If you take the computer to be the classical kind I've described so far, (not the quantum kind described in the last section) and there're no changes in any laws, and there's no hocus-pocus, the answer is certainly, No! This is called the hidden-variable problem: it is impossible to represent the results of quantum mechanics with a classical universal device.
Another unique lecture is a 1959 one [2] about the potential of nanotechnology (not even a real thing back then). He speaks of directly manipulating atoms and building angstrom-scale engines and microscope with a highly unusual perspective, extremely fascinating for anyone curious about these things and the historical perspective. Even for Feynman’s standards, this was a unique mix of topics and terminology. For context, the structure of DNA has been discovered about 5 years prior, and the first instruments capable of atomic imaging and manipulation are from at least the 80’s.
If you’re captivated by this last one as I was, I can also recommend Greg Bear’s novel “Blood Music”. It doesn’t explore the nanotechnology side much, but the main hook is biological cells as computers. Gets very crazy from there on.
If you're into atomic physics and getting a feel for the intricate structure of the basic processes, the best find I had recently is this MIT course by Wolfgang Ketterle. The first lecture is an informal overview, and he gives vivid and detailed descriptions of the phenomena they can create and control now, like why we see different kinds of thing happening at very low temperatures: the atoms are moving past each other so slowly that it gives their wavefunctions time to overlap and interact, using intersecting lasers to create arrays of dimples in the electromagnetic field to draw in and hold single atoms, this kind of thing. It gives a more tangible insight into the quantum aspects of matter that can otherwise seem inscrutable
The quote is not suggesting a quantum computer can’t be simulated classically, it can in fact, just slowly, by keeping track of the quantum state where n qubits is 2^n complex amplitudes.
It relates more to the Bell results, that there doesn’t exist a hidden variable system that’s equivalent to QM.
“There’s plenty of space at the bottom” only really took off in popularity decades later. Feynman’s accomplishments are undeniable, Nobel prize and all, but his celebrity status is given by other aspects of his personality. No Feynman equivalent I can think of is alive today. Perhaps Geoffrey Hinton and his views on the risk of AGI? He’s far from the only one of course.
You seem to be familiar with the field, yet this is a very strange view? I work on exactly this slice of solid state physics and semiconductor devices. I’m not sure what you mean here.
The way we construct Hamiltonians is indeed somewhat ad hoc sometimes, but that’s not because of lack of fundamental knowledge. In fact, the only things you need are the mass of the electron/proton and the quantum of charge. Everything else is fully derived and justified, as far as I can think of. There’s really nothing other than the extremely low energy limit of QED in solid state devices, then it’s about scaling it up to many body systems which are computationally intractable but fully justified.
We don’t even use relativistic QM 95% of the time. Spin-orbit terms require it, but once you’ve derived the right coefficients (only needed once) you can drop the Dirac equation and go back to Schrödinger. The need for empirical models has nothing to do with fundamental physics, and all to do with the exorbitant complexity of many-body systems. We don’t use QFT and the standard model just because, as far as I can tell, the computation would never scale. Not really a fault of the standard model.
This is not as good a refusal as you think it is. To me (and I imagine, the parent poster) there is no extra logical step needed. The problem IS solved in this sense.
If it’s completely impossible to even imagine what the answer to a question is, as is the case here, it’s probably the wrong question to pose. Is there any answer you’d be satisfied by?
To me the hard problem is more or less akin to looking for the true boundaries of a cloud: a seemingly valid quest, but one that can’t really be answered in a satisfactory sense, because it’s not the right one to pose to make sense of clouds.
> If it’s completely impossible to even imagine what the answer to a question is, as is the case here, it’s probably the wrong question to pose. Is there any answer you’d be satisfied by?
I would be very satisfied to have an answer, or even just convincing heuristic arguments, for the following:
(1) What systems experience consciousness? For example, is a computer as conscious as a rock, as conscious as a human, or somewhere in between?
(2) What are the fundamental symmetries and invariants of consciousness? Does it impact consciousness whether a system is flipped in spacetime, skewed in spacetime, isomorphically recast in different physical media, etc.?
(3) What aspects of a system's organization give rise to different qualia? What does the possible parameter space (or set of possible dynamical traces, or what have you) of qualia look like?
(4) Is a consciousness a distinct entity, like some phase transition with a sharp boundary, or is there no fundamentally rigorous sense in which we can distinguish each and every consciousness in the universe?
(5) What explains the nature of phenomena like blindsight or split brain patients, where seemingly high-level recognition, coordination, and/or intent occurs in the absence of any conscious awareness? Generally, what behavior-affecting processes in our brains do and do not affect our conscious experience?
And so on. I imagine you'll take issue with all of these questions, perhaps saying that "consciousness" isn't well defined, or that an "explanation" can only refer to functional descriptions of physical matter, but I figured I would at least answer your question honestly.
(1) is perhaps more of a question requiring a strict definition of consciousness in the first place, making it mostly circular. (2) and especially (3) are the most interesting, but they seem part of the easy problem instead. And I’d say we already have indications that the latter option of (4) is true, given your examples from (5) and things like sleep (the most common reason for humans to be unconscious) being in distinct phases with different wake up speed (pun partially intended). And if you assume animals to be conscious, then some sleep with only one hemisphere at a time. Are they equally as conscious during that?
My imaginary timeline of the future has scientific advancements would lead to us noticing what’s different between a person’s brain in their conscious and unconscious states, then somehow generalize it to a more abstract model of cognition decoupled from our biological implementation, and then eventually tackle all your questions from there. But I suspect the person I originally replied to would dismiss them as part of the easy problem instead, i.e. completely useless for tackling the hard problem! As far as I’m concerned, it’s the hard problem that I take issue with, and the one that I claim isn’t real.
I much agree, especially on the importance of defining what we mean by the word "conscicousness", before we say we cannot explain it. Is a rock conscious? Sure according to some deifinition of the word. Probably everybody would agree that there are different levels of consciousness, and maybe we'd need different names for them.
Animals are clearly conscious in that they observe the world and react to it and even try to proactively manipulate it.
The next level of consciousness, and what most people probably mean when they use the word is human ability to "think in language". That opens up a whole new level, of consciousness, because now we can be conscious of our inner voice. We are conscious of ourselves, apart from the world. Our inner voice can say things about the thing which seems to be the thing uttering the words in our mind. Me.
Is there anything more to consciousness than us being aware that we are conscious? It is truly a wondrous experience which may seem like a hard problem to explain, hence the "Hard Problem of Consciousness", right? But it's not so mysterious if we think of it in terms of being able to use and hear and understand language. Without language our consciousness would be on the level of most animals I assume. Of course it seems that many animals use some kind of language. But, do they hear their "inner voice"? Hard to say. I would guess not.
And so again, in simple terms, what is the question?
This is precisely the matter, I wholeheartedly agree. The metacognition that we have, that only humans are likely to have, is the root behind the millennium-long discussions on consciousness. And the hard problem stems from whatever was left of traditional philosophers getting hit by the wall of modern scientific progress, not wanting to let go of the mind as some metaphysical entity beyond reality, with qualia and however many ineffable private properties.
The average person may not know the word qualia, but “is your red the same as my red” is a popular question among kids and adults. Seems to be a topic we are all intrinsically curious about. But from a physical point of view, the qualia of red is necessarily some collection of neurons firing in some pattern, highly dependent on the network topology. Knowing this, then the question (as it was originally posed) is immediately meaningless. Mutatis mutandis, same exact argument for consciousness itself.
Talking of "qualia" I think feeling pain is a good example. We all feel pain from time to time. It is a very conscious experience. But surely animals feel pain as well, and it is that feeling that makes them avoid things that cause them pain.
Evolution just had to give us some way to "feel", to be conscious, about some things causing us pain while other things cause us pleasure. We are conscious of them, and I don't think there's any "hard question" about why we feel them :-)
You also don’t consciously use your senses until you actively think about them. Same as “you are now aware of your breathing”. Sudden changes in a sensation may trigger them to be conscious without “you” taking action, but that’s not so different. You’re still directing your attention to something that’s always been there.
I agree with the poster (and Daniel Dennet and others) that there isn’t anything that needs explaining. It’s just a question framing problem, much like the measurement problem in quantum mechanics.
My main criticism would be that it doesn’t seem like this test allows online learning, which is what humans do (over the scale of days to years). So in practice it may still collapse to what you point out, but not because the task is unsuited to showing AGI.
reply