…it really feels like they’re attempting to reinvent a project tracker and starting off from scratch in thinking about it.
It feels like they’re a few versions behind what I’m doing, which is… odd.
Self-hosting a plane.io instance. Added a plane MCP tool to my codex. Added workflow instructions into Agents.md which cover standards, documentation, related work, labels, branch names, adding of comments before plan, after plan, at varying steps of implementation, summary before moving ticket to done. Creating new tickers and being able to relate to current or others, etc…
It ain’t that hard. Just do inception (high to mid level details) create epics and tasks. Add personas, details, notes, acceptance criteria and more. Can add comments yourself to update. Whatever.
Slice tickets thin and then go wild. Add tickets as your working though things. Make modifications.
This is actually very interesting I think, as Anthropic pushes against The Bitter Lesson a bit! The model is a great reasoner, but we still need a concrete way to manage tasks - like we needed for tool calling. Claude Code has an opinionated loop, something like ReAct/CoT etc with prompting tricks for tasks/skills/etc, but here they add a Hierarchical Controller/Worker thing leveraging the Claude SDK. Mixing agency with actual control using program logic - not just alignment using prompts screaming in all caps and emoji.
We are going to break out of the coding agent’s loop in this way - it’s sorta curving back around to Workflows, after leaving them behind for agency, but right now we need to orchestrate this with deterministic code written mostly by humans - like the git repo anthropic shared. This won’t last long.
Used an LLM to help write the following up as I’m still pretty scattered about the idea and on mobile.
——
Something I’ve been going over in my head:
I used to work in a pretty strict Pivotal XP shop. PM ran the team like a conductor. We had analysts, QA, leads, seniors. Inceptions for new features were long, sometimes heated sessions with PM + Analyst + QA + Lead + a couple of seniors. Out of that you’d get:
- Thinly sliced epics and tasks
- Clear ownership
- Everyone aligned on data flows and boundaries
- Specs, requirements, and acceptance criteria nailed at both high- and mid-level
At the end, everyone knew what was talking to what, what “done” meant, and where the edges were.
What I’m thinking about now is basically that process, but agentized and wired into the tooling:
- Any ticket is an entry point into a graph, not just a blob of text.
- Epics ↔ tasks ↔ subtasks
- Linked specs / decisions / notes
- Files and PRs that touched the same areas
- Standards live as versioned docs, not just a random Agents.md:
- Markdown (with diagrams) that declares where it applies: tags, ticket types, modules.
- Tickets can pin those docs via labels/tags/links.
- From the agent’s perspective, the UI is just a viewer/editor.
- The real surface is an API: “given this ticket, type, module, and tags, give me all applicable standards, related work, and code history.”
- The agent then plays something like the analyst + senior engineer role:
- Pulls in the right standards automatically
- Proposes acceptance criteria and subtasks
- Explains why a file looks the way it does by walking past tickets / PRs / decisions
So it’s less “LLM stapled to an issue tracker” and more “that old XP inception + thin-slice discipline, encoded as a graph the agent can actually reason over.”
Has any project tried forcing a planning layer as //TODO all throughout the code before making any changes? small loops like one //TODO at a time? What about limiting changes to a function at a time to remain focused? Or is everyone a slave to however the model was designed and currently they are designed for giant one-shot generations only?
Is it possible that all local models need to be better is more context used to make simpler smaller changes at a time? I haven't seen enough specific comparisons of how local models fail vs the expensive cloud models.
I did find beads helpful for some of this multi-context window tasks. It sounds a little like there is some convergence between what they are suggesting and how it give you light weight sub tasks that survive a /clear.
> It sounds a little like there is some convergence between what they are suggesting and how it give you light weight sub tasks that survive a /clear.
I do see the convergence there. Beads gives you that "state that survives `/clear`," and Anthropic’s harness tries to do something similar at a higher level.
I've been thinking about this with a pretty simple, old-school analogy:
You're at a shop with solid engineering and ticketing practices. You just hired a great junior developer. They know the stack, maybe even the domain basics, but they don't yet know:
- Your business processes
- The quirks of your microservices
- Local naming conventions, standards, etc.
- Team norms around testing, logging, and observability
You trust them with important tasks, but expect their context will frequently get blown away by interruptions, meetings, task-switching, and long weekends. T handle this, need to make sure each ticket or note contains enough structured info so that when they inevitably lose context, they can pick right back up.
For each ticket, you'd likely include:
- Personas and user goals
- Acceptance criteria, Given/When/Then scenarios
- Links to specs, documentation, related tickets, or prior art
- A short summary of their current understanding
- Rough plan (steps, what's done/not done)
- Decisions made and their rationale ("I chose X because Y")
- Open questions or known gotchas
End of day Friday, that junior would ideally leave notes that answer:
"If I have total amnesia next Tuesday, what's the minimum needed to quickly reload my context?"
To me, agent harnesses like Anthropic's or Beads are just formalizing exactly this pattern:
- `/clear` or `/new` is like a "long weekend brain wipe."
- Persistent subtasks or controllers become structured scaffolding.
- The crucial piece isn't remembering everything, just clearly capturing intent, decisions, rationale, and immediate next steps.
My confusion about Anthropic’s approach is why they're doing this over plain text files or JSON, instead of leveraging decades of existing tracker and project-management tooling—which already encode this exact workflow and best practice.
It feels like they’re a few versions behind what I’m doing, which is… odd.
Self-hosting a plane.io instance. Added a plane MCP tool to my codex. Added workflow instructions into Agents.md which cover standards, documentation, related work, labels, branch names, adding of comments before plan, after plan, at varying steps of implementation, summary before moving ticket to done. Creating new tickers and being able to relate to current or others, etc…
It ain’t that hard. Just do inception (high to mid level details) create epics and tasks. Add personas, details, notes, acceptance criteria and more. Can add comments yourself to update. Whatever.
Slice tickets thin and then go wild. Add tickets as your working though things. Make modifications.
Why so difficult?