Overall, I'm really impressed by what you accomplished! I'm not a researcher, so not sure if this is that helpful, but here are some thoughts:
- I wonder if the "move" action is difficult for the model to learn to use well. The model sees token location as positional encodings in the embedding, not sparse character offsets. Would be interesting to see something more like "jump to next/previous [token or set of tokens]". Or maybe a find/replace like most coding harness edit tools use?
- I'd move the exact training data generation details to an appendix. Could be summarized to improve the flow of the paper.
Hi, thank you for your advice, I really appreciate it!
My model has been able to move pretty naturally throughout the canvas when editing, the model is able to remember the actual canvas including order of the tokens well, but I understand where you're coming from.
Jump to next/previous token is a good idea, and in the future I can definitely look into implementing it, especially for scaling the model up. Same thing with find/replace. Thanks again.
Cool concept! I think the hardest part will be getting people in the target audience to use it. A lot of indie hackers make software for other indie hackers, but that isn't true of most other verticals. And honestly building software for indie hackers feels like a losing battle. Any ideas of how to incentivize none-builders to rank projects?
From the most recent comment, looks like this is a bug, triggered by the system inadvertently activating an internal release tool [0]. Still a pretty wild bug, but not as dramatic as the title suggests. Which is kind of unfortunate honestly, the chaos of every gas town instance automatically contributing to itself would be beautiful to see.
That was my immediate impression too! It feels like it's all AI maximalists who seem to have a need to filter their every interaction through an LLM. And the result looks and reads just like Moltbook.
Yeah and the employee who generated an AI response to the AI-generated bug report, is Jared Sumner who is the founder of Bun which was acquired by Anthropic. Pretty sad state of affairs all around.
It feels (nobody can prove it) that all user-facing applications are fully vibe-coded and no internal developers have any idea how they work, so they just keep redirecting user questions to Claude to answer on behalf of them. That's why they are dealing with regressions and downtimes every few releases as it's the usual pattern with vibe coding that bug keep resurfacing.
If all LLM advancements stopped today, but compute + energy got to the price where the $30 million zettaflop was possible, I wonder what outcomes would be possible? Would 1000 claudes be able to coordinate in meaningful ways? How much human intervention would be needed?
Headline/article is extremely misleading. They still have subscription plans with included usage, but those usage limits are now based on tokens instead of messages.
I like this, and think it's true for how humans learn. What's interesting to me is that it seems LLMs are significantly smarter than they were two years ago, but it doesn't feel like they have better "taste". Their failure modes are still bizarre and inhuman. I wonder what it is about their architecture/training that scales their experience without corresponding improvements in taste.
In theory, RLVR should encourage less error-prone code, similar to a human getting burned by production outages like the article mentioned. Maybe the scale in training just isn't big enough for that to matter? Perhaps we need better benchmarks that capture long-term issues that arise from bad models and unnecessary complexity.
- I wonder if the "move" action is difficult for the model to learn to use well. The model sees token location as positional encodings in the embedding, not sparse character offsets. Would be interesting to see something more like "jump to next/previous [token or set of tokens]". Or maybe a find/replace like most coding harness edit tools use?
- I'd move the exact training data generation details to an appendix. Could be summarized to improve the flow of the paper.
reply