> No, it's done to turn an uncreative model into a creative model. This idea that sampling isn't that important or is some violation of the bitter lesson is exactly why I had to call out the whole academic field as having a giant blindspot for this kind of research in our oral presentation at ICLR!
I see this sentiment a lot, there's even people that swear by samplers like XTC (which sounds counter intuitive af) but it's always on "creative" tasks. On math tasks, with a clear correct/incorrect answer, none of the "creative" samplers come on top, not even min_p (except for crazy temperatures, and even there the overall accuracy is still lower than normal temps w/ normal sampling)...
The main problem is that "creativity" is such a subjective measure that it's hard to score properly.
I think "crazy" temperatures start around 100, not 2-3 as folks commonly claim in the literature.
You're right in general on this post, but I think you underestimate how many coomers/erp folks there are and how much they use LLMs. XTC was made for them to give some notion of slop removal. It's probably not quite as good at that task as the antislop sampler (from Sam Peach, EQ bench creator) - but I find XTC to be quite good at adding "spice" to outputs.
re: difficulty to measure "creativity" is especially true - especially around the difficulty of scoring it! We have some nitpickers of our own whispering into our ears about this. You don't happen to be at Stanford do you? IFYKYK...
I see this sentiment a lot, there's even people that swear by samplers like XTC (which sounds counter intuitive af) but it's always on "creative" tasks. On math tasks, with a clear correct/incorrect answer, none of the "creative" samplers come on top, not even min_p (except for crazy temperatures, and even there the overall accuracy is still lower than normal temps w/ normal sampling)...
The main problem is that "creativity" is such a subjective measure that it's hard to score properly.