That’s precisely what I meant in my comment by “these types of tests.” People ar...

pillefitz · 2025-03-25T05:32:35 1742880755

They are useful to reach Arc-N+1

Chathamization · 2025-03-25T05:46:21 1742881581

How are any of these a useful path to asking an AI to cook dinner?

We already know many tasks that most humans can do relatively easily, yet most people don’t expect AI to be able to do them for years to come (for instance, L5 self-driving). ARC-AGI appears to be going in the opposite direction - can these models pass tests that are difficult for the average person to pass.

These benchmarks are interesting in that they show increasing capabilities of the models. But they seem to be far less useful at determining AGI than the simple benchmarks we’ve had all along (can these models do everyday tasks that a human can do?).

fastball · 2025-03-25T08:28:42 1742891322

The "everyday tasks" you specifically mention involve motor skills that are not useful for measuring intelligence.

mchusma · 2025-03-25T21:36:48 1742938608

Genuine question, do you feel Waymo is not L5 self-driving? I Waymo has L5 but its not truly economic yet.