> After about 3-4k lines of code I completely lost track of what is going on... Overall I would say it was a horrible experience, even though it took 10 hours to write close to 10000 lines of code
It's hard to take very much away from somebody else's experiences in this area. Because if you've been doing a substantial amount of AI coding this year, you know that the experience is highly dependent on your approach.
How do you structure your prompts? How much planning do you do? How do you do that planning? How much review do you do, and how do you do it? Just how hands-on or hands-off are you? What's in your AGENTS.md or equivalent? What other context do you include, when, why, and how? What's your approach to testing, if any? Do you break down big projects into smaller chunks, and if so, how? How fast vs slow are you going, i.e. how many lines of code are you letting the AI write in any given time period? Etc.
The answers to these questions vary extremely wildly from person to person.
But I suspect a ton of developers who are having terrible experiences with AI coding are quite new to it, have minimal systems in place, and are trying "vibe coding" in the original sense of the phrase, which is to rapidly prompt the LLM with minimal guidance and blindly trust its code. In which case, yeah, that's not going to give you great results.
I spent considerable time trying to coax the agentic systems into decent coding capabilities. The thing that struck me most is how creative they are at finding new ways to fail and make me adjust my prompt.
It got tiring, so I'm on a break from ai coding until I have bandwidth to build my own agent. I don't think this is something we should be outsourcing to the likes of OpenAI, Microsoft, Anthropic, Google, Cursor, et al. Big Tech has shown their priorities lie elsewhere from our success and well being
Exactly my experience too. I'm now using AI like 25% of the time or less. I always get to a point where I see that using agentic coding is making me not want to actually think, there's no way anyone can convince me that that is a superior approach, because every time I took days off the agents to actually think, I came up with a far superior architecture and code that even rendered much of what the agents were hammering away at moot.
Agentic coding is like a drug or slot machine, it slowly draws you in with the implicit promise of getting much for little. The only ways it is useful to me now is for very focused tasks where I have spent a lot of time defining the architecture down to the last detail, and the agents are used to fill in the blanks, as it were.
I also think I could write a better agent, and as to why the bog corps have not done so is baffling to me. Just event getting the current agents to obey the guidelines in the agent .md files is a struggle. They forget pretty much everything two prompts down the line. Why can't the CLI systemically prompt them to check every time, etc.?
Something tell me the future is about domain-aware agents that help users to wring better performance out of the models, based on some domain-specific deterministic guardrails.
I've had experiences like this before but if that's the ONLY experience you've had, or if you have that experience 75% of the way, I think you're doing something wrong. Or perhaps you're just in a very different coding domain than I am (web dev, HTML/CSS/JS) where the AI happens to suck.
The biggest mistakes imo are:
1. Underplanning. Trying to do huge projects in one go, rather than breaking them down into small projects, and breaking those small projects down into well thought out plans.
2. Too much of a focus on prompting rather than context. Prompt engineering is obsessing with the perfect way to say or phrase something. Whereas context engineering putting relevant information into the LLM's working memory, which requires you to go out and gather that info (or use the LLM to get it).
I've had my share of good and bad experiences, one section of an existing project more than 90% ai created. How you say things is equally important to the context you provide, partly because the agents will start trying to decide what is and is not good context, which they are unreliable in doing, even after you give them the limited context and tell them not to edit other files or bring in more context. For example, if you use a lot of colloquial phrases, you activate that area of the network, taking away from using other parts (MoE activation, also lower level too)
They are not good readers (see research results around context collapse and context poisoning)
If we take Elon Musks approach to challenging engineering problems, which in this exact order is:
1. Question every requirement
2. Delete any part of the process you can
3. Simplify and optimize
4. Accelerate cycle time
5. Automate
In my experience coding agents at the moment are really good at 4. and 5. and they absolutely suck at 1. and 2.
3, they are okay at if prompted well.
Humans are okay at 1. and 2. IF they understand the system well and critically question requirements. With LLM generated codebases this system understanding is often missing. So you can't even start with 1.
If we're talking about emulating users, sure, but this is supposed to be a tool that helps me get my job done.
If (i.e.) you dig into how something like copilot works, they do dumb things like ask^ the LLM to do glob matching after a file read (to pull in more instructions)... just use a damn glob library instead of a non-deterministic and known to be unreliable method
^ it's just a table in the overall context, so "asking" is a bit anthropomorphizing
> ^ it's just a table in the overall context, so "asking" is a bit anthropomorphizing
I interpreted GP as just saying that you are already anthropomorphizing too much by supposing that the models "find" new ways to fail (as if trying to defy you).
most humans do not seek out ways to defy after a certain age
I did not mean to imply active choice by "find", more that they are reliably non-deterministic and have a hard time sticking to, or easy time ignoring, the instructions I did write
I think you're making a fair comment, but it still irks me that you're quite light on details on what the "correct" approach is supposed to be, and it irks me also because it seems to now be a pattern in the discussion.
Someone gives a detailed-ish account of what they did, and that it didn't work for them, and then there are always people in the comments saying that you were doing it wrong. Fair! But at this point, I haven't seen any good posts here on how to do it _right_.
This dynamic reminds me of an experience I had a year ago, when I went down a Reddit rabbit hole related to vitamins and supplements. Every individual in a supplement discussion has a completely different supplement cocktail that they swear by. No consensus ever seems to be reached about what treatment works for what problem, or how any given individual can know what's right for them. You're just supposed to keep trying different stuff until something supposedly works. One must exquisitely adjust not only the supplements themselves, but the dosage and frequency, and a bit of B might be needed to cancel out a side effect of A, except when you feel this way you should do this other thing, etc etc etc.
I eventually wrote the whole thing off as mostly one giant choose-your-own-adventure placebo effect. There is no end to the epicycles you can add to "perfect" your personal system.
Try using spec kit. Codex 5 high for planning; Claude code sonnet 4.5 for implementation; codex 5 high for checking the implementation; back to Claude code for addressing feedback from codex; ask Claude code to create a PR; read the PR description to ensure it tracks your expectations.
There’s more you’ll get a feel for when you do all that. But it’s a place to start.
Speaking as someone for whom AI works wonderfully, I’ll be honest: the reason I’ve kept things to myself is because I don’t want to be attacked and ridiculed by the haters. I do want to share what I’ve learned but I know that everything I write will be picked apart with a fine toothed comb and I have to interest in exposing myself to toxicity that comes with such behavior.
Relentlessly break things down. Never give the LLM a massive, complex project. You should be subdividing big projects into smaller projects, or into phases.
Planning is 80% of the battle. If you have a well-defined plan, that defines the architecture well, then your LLM is going to stick to that plan and architecture. Every time my LLM makes mistakes, it's because there were gaps in my plan, and my plan was wrong.
Use the LLM for planning. It can do research. It can brainstorm and then evaluate different architectural approaches. It can pick the best approach. And then it can distill this into a multi-phased plan. And it can do this all way faster than you.
Store plans in Markdown files. Store progress (task lists) in these same Markdown files. Ensure the LLM updates the task lists as you go with relevant information. You can @-mention these files when you run out of context and need to start a new chat.
When implementing a new feature, part of the plan/research should almost always be to first search the codebase for similar things and take note of the patterns used. If you skip this step, your LLM is likely to unnecessarily reinvent the wheel
Learn the plan yourself, especially if it's an ambitious one. I generally know what my LLM is going to do before it does it, because I read the plan. Reading the plan is tedious, I know, so I generally ask the LLM to summarize it for me. Depending on how long the plan is, I tell it to give me a 10-paragraph or 20-paragraph or 30-paragraph summary, with one sentence per paragraph, and blank lines in between paragraphs. This makes the summary very easy to skim. Then I reply with questions I have, or requests for it to make changes to the plan.
When the LLM finishes a project, ask it to walk you through the code, just like you asked it to walk you through the plan ahead of time. I like to say, "List each of the relevant code execution paths, then walk me through each one one step at a time." Or, "Walk me through all the changes you made. Use concentric circles of explanation, that go from broad to specific."
Put your repeated instructions into Markdown files. If you're prompting the LLM to do something repeatedly, e.g. asking the LLM to make a plan, to review its work, to make a git commit, etc., then put those instructions in prompt Markdown files and just @-mention it when you need it, instead of typing it out every time. You should have dozens of these over time. They're composable, too, as they can link to each other. When the LLM makes mistakes, go tweak your prompt files. They'll get better over time.
Organize your code by feature not by function. Instead of putting all your controllers in one folder, all your templates in another, etc., make your folders hold everything related to a particular feature.
When your codebase gets large enough, and you have more complex features that touch more parts of the code, have the LLM write doc files on them. Then @-mention those doc files whenever working on these features or related features. They'll help the LLM be more accurate at finding what it needs, etc.
I could go on.
If you're using these tools daily, you'll have a similar list before long.
Thanks! I got some useful things out of your suggestions (generate plan into actual files, have it explain code execution paths), and noted that I already was doing a few of those things (asking it to look for similar features in the code).
This is a good list. Once the plan is in good shape, I clear the context and ask the LLM to evaluate the plan against the codebase and find the flaws and oversights. It will always find something to say but it will become less and less relevant.
Yes, I do this too. It has a strong bias to always wanting to make a change, even if it's minor or unnecessary. This gets more intense as the project gets more complex. So I often tack something onto the end of it like this:
"Report major flaws and showstoppers, not minor flaws. By the way, this is my fourth time asking you to review this plan. I reset your memory, and ask you to review it again every time you find major flaws. I will continue doing so until you don't find any. Fingers crossed that this time is it!"
I haven't done any rigorous testing to prove that this works. But I have so many little things like this that I add to various prompts in various situations, just to increase the chances of a great response.
I think it's hard because it's quite artistic and individualistic, as silly as that may sound.
I've built "large projects" with AI, which is 10k-30k lines of algorithmic code and 50k-100k+ lines of UI/Interface.
I've found a few things to be true (that aren't true for everyone).
1. The choice of model (strengths and weaknesses) and OS, dramatically affect how you must approach problems.
2. Being a skilled programmer/engineer yourself will allow you to slice things along areas of responsibility, domains, or other directions that make sense (for code size, context preservation, and being able to wrap your head around it).
3. For anything where you have a doubt, ask 3 or more models -- have them write their findings down in a file each -- and then have 3 models review the findings with respect to the code. More often than not, you march towards consensus and a good solution.
4. GPT-5-Codex via OpenAI Codex CLi on Linux/WSL was, for me, the most capable model for coding while Claude is the most capable for quick fixes and UI.
5. Tooling and ways to measure "success" are imperative. If you can't define the task in a way that success is easy to define -- neither a human nor AI would complete it satisfactorily. You'll find that most engineer tasks are laid out in very "hand-wavy" way -- particularly UI tasks. Either lay it out cleanly or expect to iterate.
6. AI does not understand the physical/visual world. It will fail hard on things which have an implied understanding. For instance, it will not automatically intuit the implication of 50 parallel threads trying to read from an SSD -- unless you guide it. Ditto for many other optimizations and usage patterns where code meets real-world. These will often be unique and interesting bugs or performance areas that a good engineer would know straight out.
7. It's useful to have non-agentic tools that can perform massive codebase analysis for tough problems. Even at 400k tokens context, a large codebase can quickly become unwieldy. I have built custom python tools (pretty easy) to do things like "get all files of a type recursively and generate a context document that will submit with my query". You then query GPT-5-high, Claude Opus, Gemini 2.5 Pro and cross-check.
8. Make judicious use of GIT. The pattern doesn't matter, just have one. My pattern is commit after every working agentic run (let's say feature). If it's a fail and taking more than a few turns to get working -- I scrap the whole thing and re-assess my query or how I might approach or break down the task.
9. It's up to you to guide the agent on the most thoughtful approaches -- this is the human aspect. If you're using Cloud Provider X and they provide cheap queues then it's on you to guide your agent to use queues for the solution rather than let's say a SQL db -- and it's on you to understand the tradeoffs. AI will perhaps help explain them but it will never truly understand your business case and requirements for reliability, redundancy, etc. Perhaps you can craft queries for this but this is an area where AI meets real world and those tend to fail.
One more thing I'd add is that you should make an attempt to fix bugs in your 'new' codebase on occasion. You'll get an understanding for how things work and also how maintainable it truly is. You'll also keep your own troubleshooting skills from atrophying.
Still waiting to see that large, impressive, complex, open-source project that was created through vibe coding / vibe engineering / whatever gimmicky phrase they come up with next!
If "large and impressive" means "has grown to that size via many contributions from lots of random developers", then I'd agree.
I don't think there is much doubt AI can produce split out a lot of code, that mostly works. It's not too hard to imagine that one day an can AI produce so much code that it's considered a "large, complex project". A single mind dedicated to a task can do remarkable things, be it human or silicon. Another mind reading what they have done, and understanding it is another thing entirely.
All long term, large projects I'm familiar have been developed over a long time by many contributors, and as a consequence there has been far more reading and understanding going on than writing new code. This almost becomes self evident when you look at large open source projects, because the code quality is so high. Everything is split into modules a single mind can pick up relatively quickly and work on in isolation. Hell, even complier error messages become self explanatory essays over time.
Or to put it another way, no open source project is a ball of mud. Balls of mud can only be maintained by the person who wrote them, who get away with it because they have most of the details stored in their context window, courtesy of writing it. Balls of mud are common in proprietary code (I've worked on a few). They are produced by a single small group were paid to labour away for years at one task. And now if this post it to be believed, AI vibe coded projects are also a source of balls of mud. Given current AI's are notoriously bad at modifying even well structured projects, they won't be maintainable by anyone.
After all of that effort is it faster than coding stuff yourself? This feels like getting into project management because you don't want to learn a new Library in something
All that effort and the writing of very specific prompts in very specific ways in order to create a determenistic output just feels like a bad version of a programming language.
If we're not telling the computer exactly what do then we're leaving the LLM to make (wrong) assumptions.
If we are telling the computer exactly what to do via natural language then it is as complicated as normal programming if not more complicated.
One of the most frustrating (but common) things is you do v1. It looks good enough.
Then you go to tweak it a little (say move one box 10-15 pixels over, or change some text sizing or whatever), and it loses its mind.
So then you spend the next several days trying every possible combination of random things to get it to actually move the way you want. It ends up breaking a bunch of other things in the process.
Eventually, you get it right, and then never ever want to touch it ever again.
These are all just stopgaps, this tech is still in its infancy. If it keeps improving, it will reach a point where it can implement complex things from simple prompts, the way that talented programmers can.
I’m not a fan of a lot of this AI stuff, but there is no reason to expect it won’t get to that level.
Talented programmers often don't get things right though. They can make the wrong assumptions about what a product person wants or what a client wants.
And that normally stems from lack of information or communication problems.
It’s magical thinking to think it won’t! We already have one example of it being possible (in humans). Unless you think humans have a “soul” or some other intangible element, then there’s no reason they can’t be emulated.
Personally, I find it faster if I use LLMs for the use cases I've found them to work well.
One example is just laborious typing-heavy stuff. Like I recently needed a table converted to an enumeration. 5 years ago I'd have spent half a day to figure out a way to sed/awk/perl that transformation. Now I can entertain an AI for half an hour or so to either do the transformation (which is easy to verify) or to setup a transformation script.
Or I enjoy that I can give an LLM a problem and 2-3 solution approaches I'd see, and get back 4-5 examples on how that code would look like in those solution approaches, and some more. Again, this would take me 1-2 days and I might not see some of the more creative approaches. Those approaches might also be entire nonsense, mind you.
But generating large amounts of code just won't be a good, time-efficient idea long-term if you have to support and change it. A lot of our code base is rather simple python, but it carries a lot of reasoning and thought behind it. Writing that code is not a bottleneck at all.
Yes, it often is much faster, and significantly so.
There are also times where it isn't.
Developing the judgment for when it is and isn't faster, when it's likely to do a good job vs isn't likely, is pretty important. But also, how good of a job it does is often a skill issue, too. IMO the most important and overlooked skill is the having the foresight and the patience to give it the context it needs to do a good job.
I'm not sure. I think it's asymmetric: high upside potential, but low downside.
Because when the AI isn't cutting it, you always have the option to pull the plug and just do it manually. So the downside is bounded. In that way it's similar the Mitch Hedberg joke: "I like an escalator, because an escalator can never break. It can only become stairs."
The absolute worse-case scenario is situations where you think the AI is going to figure it out, so keep prompting it, far past the time when you should've changed your approach or gfiven up and done it manually.
The thing is it's not actually that much effort. A day of work to watch some videos and set things up then the only issue is its another thing to remember. But we developers remember thousands of arcane incantations. This isnt any harder than any of the other ones and when applied correctly writes code very very quickly.
Has the definition of "vibe coding" changed to represent all LLM-assisted coding? Because from how I understand it, what you're talking about is not "vibe coding."
> How do you structure your prompts? How much planning do you do? How do you do that planning? How much review do you do, and how do you do it? Just how hands-on or hands-off are you? What's in your AGENTS.md or equivalent? What other context do you include, when, why, and how? What's your approach to testing, if any? Do you break down big projects into smaller chunks, and if so, how? How fast vs slow are you going, i.e. how many lines of code are you letting the AI write in any given time period? Etc.
It wouldn't be vibe coding if one did all that ;-)
The whole point of vibe coding is letting the LLM run loose, with minimal checks on quality.
Original definition (paraphrased):
"Vibe coding describes a chatbot-based approach to creating software where the developer describes a project or task to a large language model (LLM), which generates code based on the prompt. The developer does not review or edit the code, but solely uses tools and execution results to evaluate it and asks the LLM for improvements. Unlike traditional AI-assisted coding or pair programming, the human developer avoids examination of the code, accepts AI-suggested completions without human review, and focuses more on iterative experimentation than code correctness or structure."
OK. I guess strictly speaking, you could do most of what you're suggesting and still call it vibe coding.
I used agentic LLM dev tools to build the core of my webapp. It took months, I had a QA person at the beginning, and I looked at every line of committed code. It was a revelatory experience and resulted in a very reliable webapp.
Last month I thought: "OK, I have all kinds of rules, guardrails, and I am relatively excellent at managing context. Let's try to 'vibe code' some new features."
It has been a total disaster and worse than a waste of time. I keep finding entirely new weird bugs it created. This is just a React/Vite/Supabase app, nothing nuts. The worst part is that I showed these vibed features to stakeholders, and they loved it. Now I have to explain why recreating these features is going to take much longer.
I knew better, as the magic of vibe coding is to explore the MVP space, and I still fell for it.
Coding with an AI is an amplifier. It'll amplify your poor planning just as much as it amplifies your speed at getting some coding task done.
An unplanned problem becomes amplified by 10-100x worse than if you coded things slowly, by hand. That's when the AI starts driving you into Complexity Corner™ (LOL) to work around the lack of planning.
If all you're ever doing is using prompts like, `write a function to do...` or `write a class...` you're never going to run into the sorts of super fucking annoying problems that people using AI complain the most about.
It's soooooo tempting to just tell the AI to make complete implementations of things and say to yourself, "I'll clean that up later." You make so much progress so fast this way it's unreal! Then you hit Complexity Corner™ where the problem is beyond the (current) LLM's capabilities.
Coding with AI takes discipline! Not just knowledge and experience.
I agree, but would maybe argue that the level of instructions can be slightly higher level than "write function" or "write class" without ending up in a veritable cluster fuck, especially if guard rails are in place.
It's far more than planning. You have to "get to know your LLM" and it's quirks so you know how to write for it, and when they release new updates (cut off time or version), you have to do it again. Same for the agentic frameworks, when they change their system prompts and such.
It's a giant, non-deterministic, let's see what works good based on our vibes, mess of an approach right now. Even within the models, architecturally, there are recent results that indicate people are trying out weird things to see if they work, unclear if they are coming from first-principle intuition and hypothesis formation, or just throwing things at the wall to see what sticks
> Since vibe coding is so chaotic, rigorous planning is required, which not every developer had to do before.
I do believe the problem is different:
I think I am pretty good at planning, but I have a tendency (in particular for private projects) to work on things where the correctness requirements are very high.
While I can describe very exactly what the code is supposed to do, even small errors can make the code useless. If the foundations are not right, it will be complicated to detect errors in the higher levels (to solve this issue, I implement lots of test cases).
Also, I often have a very specific architecture for my code in mind. If the AI tries to do things differently, the code can easily become much less useful. In other words: concerning this point, exactly because I plan things carefully (as you claimed), the AI becomes much less useful if it "does its own thing" instead of following my plan.
Yes of course, have the AI write the agents.md file which the AI can then use to make changes to the project. This of course works better than just having the AI write changes to the project directly.
I do not, but I don't do that for LLMs either. Conventions and documentation I write and present are as succinct or lengthy as they need to be, no matter if the recipient is human or machine.
I mean I had my LLM generate most of my AGENTS.md, and I tweak it maybe once every week or two. It's minimal investment, and it's a gift that keeps on giving.
There is no "working prompt". There is context that is highly dependant on the task at hand. Here are some general tips:
- tell it to ask you clarifying questions, repeatedly. it will uncover holes and faulty assumptions and focus the implementation once it gets going
- small features, plan them, implement them in stages, commit, PR, review, new session
- have conventions in place, coding style, best practices, what you want to see and don't want to see in a codebase. we have conventions for python code, for frontend code, for data engineering etc.
- make subagents work for you, to look at a problem from a different angle (and/or from within a different LLM altogether)
- be always critical and dig deeper if you have the feeling that something is off or doesn't make sense
- good documentation helps the machine as well as the human
I suspect you've gotten lucky. I do a lot of planning and prompt editing and have plenty of outrageous failures that don't make any sense given the context.
I completely agree with this approach. I just finished an intensive coding session with Cursor, and my workflow has evolved significantly. Previously, I'd simply ask the AI to implement entire features and copy-paste code until something worked.
Now I take a much more structured approach: I scope changes at the component level, have the agent map out dependencies (state hooks, etc.), and sometimes even use a separate agent to prototype the UI before determining the necessary architecture changes. When tackling unfamiliar territory, I pause and build a small toy example myself first before bringing Cursor into the picture.
This shift has been transformative. I used to abandon projects once they hit 5K lines because I'd get lost in complexity. Now, even though I don't know every quirk of my codebase, I have a clear mental model of the architecture and understand the key aspects well enough to dive in, debug, and make meaningful progress across different parts of the application.
What's interesting is that I started very deliberately—slowly mapping out the architecture, deciding which libraries to use or avoid, documenting everything in an agent.md file. Once I had that foundation in place, my velocity increased dramatically. It feels like building a castle one LEGO brick at a time, with Cursor as my construction partner.
It's hard to take very much away from somebody else's experiences in this area. Because if you've been doing a substantial amount of AI coding this year, you know that the experience is highly dependent on your approach.
How do you structure your prompts? How much planning do you do? How do you do that planning? How much review do you do, and how do you do it? Just how hands-on or hands-off are you? What's in your AGENTS.md or equivalent? What other context do you include, when, why, and how? What's your approach to testing, if any? Do you break down big projects into smaller chunks, and if so, how? How fast vs slow are you going, i.e. how many lines of code are you letting the AI write in any given time period? Etc.
The answers to these questions vary extremely wildly from person to person.
But I suspect a ton of developers who are having terrible experiences with AI coding are quite new to it, have minimal systems in place, and are trying "vibe coding" in the original sense of the phrase, which is to rapidly prompt the LLM with minimal guidance and blindly trust its code. In which case, yeah, that's not going to give you great results.