well with the recent delays i can easily find claude code going off on it's own for 20 minutes and have no idea what it's going to come back with. but one time it overflowed it's context on a simple question, and then used up the rest of my session window. in a way a lot of ai assistants have ime have this awkward thing where they complicate something in a non-visible and think about it for a long time burning up context before coming up with a summary based upon some misconception.
The key is a well defined task with strong guardrails. You can add these to your agents file over time or you can probably just find someone's online to copy the basics from. Any time you find it doing something you didn't expect or don't like, add guardrails to prevent that in future. Claude hooks are also useful here, along with the hookify plugin to create them for you based on the current conversation.
For complex tasks I ask ChatGPT or Grok to define context then I take it to Claude for accurate execution. I also created a complete pipeline to use locally and enrich with skills, agents, RAG, profiles. It is slower but very good. There is no magic, the richer the context window the more precise and contained the execution.