Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fine, but to me reasoning is this the where you have <think> tags and use RL to decide what's to be generated in-between them.

Of course, people regarded things like GSM8k with trained reasoning traces as reasoning too, but it's pretty obviously not quite the same thing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: