Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are the benefits of Aristotle over a general-purpose coding assistant like Claude Code?


Aristotle's output is formally verified in Lean, so you can run it for days on a hard problem and be assured that the answer, no matter how complex, is right without needing to manually check it.

Claude Code can write lean, but we do a heck of a lot of RL on theorem proving, so Aristotle winds up being much better at writing Lean than other coding agents are.


Seeing a task-specific model be consistently better at anything is extremely surprising given rapid innovation in foundation models.

Have you tried Aristotle on other, non-Lean tasks? Is it better at logical reasoning in general?


Is it though? There is a reason gpt has codex variants. RL on a specific task raises the performance on that task


Post-training doesn't transfer over when a new base model arrives so anyone who adopted a task-specific LLM gets burned when a new generational advance comes out.


Resouce-affording, if you are chasing the frontier of some more niche task you redo your training regime on the new-gen LLMs


how strong is your internal informal LLM at theorem-proving before the formalization stage? or it's combined in a way so that is not measurable?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: