Agreed. This maps directly to the white-box vs black-box testing distinction: either you own your priors and trace the full data lineage from training through validation, or you're relying on an opaque validation set of unknown provenance. And that's before factoring in the organizational politics.
Great work! What is the difference between writing a shell script to solve "cloud setup, ETL pipelines, orchestration, cost monitoring", and using a local app?
I used to write a shell script to do all this. Then the number of scripts started adding up, got overly complex. This local app here is what evolved from all that, but with more reliable query running, compute management, spark environment, and above all UI and AI that can make this process seamless than a cluttered CLI UX.
Interesting paper, it reminds me how we human improving our math problem solving capability. But those efficient guys use few problems(and reasoning trajectories) to build their solving skills, how to build this effective human training process into LLM's?
reply