I feel like I'm taking crazy pills. The article starts with:
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
I usually do most of the engineering and it works great for writing the code. I’ll say:
> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.
> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.
> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).
I feel that with such an elaborated description you aren't too far away from writing that yourself.
If that's the input needed, then I'd rather write code and rely on smarter autocomplete, so meanwhile I write the code and think about it, I can judge whether the LLM is doing what I mean to do, or straying away from something reasonable to write and maintain.
Yeah, I feel like I get really good results from AI, and this is very much how I prompt as well. It just takes care of writing the code, making sure to update everything that is touched by that code guided by linters and type-checkers, but it's always executing my architecture and algorithm, and I spend time carefully trying to understand the problem before I even begin.
But this is what I don't get. Writing code is not that hard. If the act of physically typing my code out is a bottleneck to my process, I am doing something wrong. Either I've under-abstracted, or over-abstracted, or flat out have the wrong abstractions. It's time to sit back and figure out why there's a mismatch with the problem domain and come back at it from another direction.
To me this reads like people have learned to put up with poor abstractions for so long that having the LLM take care of it feels like an improvement? It's the classic C++ vs Lisp discussion all over again, but people forgot the old lessons.
It's not that hard, but it's not that easy. If it was easy, everyone would be doing it. I'm a journalist who learned to code because it helped me do some stories that I wouldn't have done otherwise.
But I don't like to type out the code. It's just no fun to me to deal with what seem to me arbitrary syntax choices made by someone decades ago, or to learn new jargon for each language/tool (even though other languages/tools already have jargon for the exact same thing), or to wade through someone's undocumented code to understand how to use an imported function. If I had a choice, I'd rather learn a new human language than a programming one.
I think people like me, who (used to) code out of necessity but don't get much gratification out of it, are one of the primary targets of vibe coding.
I'm pretty damn sure the parent, by saying "writing code" meant the physical act of pushing down buttons to produce text, not the problem solving process that preceeds writing said code.
This. Most people defer the solving of hard problems to when they write the code. This is wrong, and too late to be effective. In one way, using agents to write code forces the thinking to occur closer to the right level - not at the code level - but in another way, if the thinking isn’t done or done correctly, the agent can’t help.
I can spend all the time I want inside my ivory tower, hatching out plans and architecture, but the moment I start hammering letters in the IDE my watertight plan suddenly looks like Swiss cheese: constraints and edge cases that weren't accounted for during planning, flows that turn out to be unfeasible without a clunky implementation, etc...
That's why Writing code has become my favorite method of planning. The code IS the spec, and English is woefully insufficient when it comes to precision.
This makes Agentic workflows even worse because you'll only your architectural flaws much much later down the process.
I also think this is why AI works okay-ish on tiny new greenfield webapps and absolutely doesn't on large legacy software.
You can't accurately plan every little detail in an existing codebase, because you'll only find out about all the edge cases and side effects when trying to work in it.
So, sure, you can plan what your feature is supposed to do, but your plan of how to do that will change the minute you start working in the codebase.
Yeah, I think this is the fundamental thing I'm trying to get at.
If you think through a problem as you're writing the code for it, you're going to end up the wrong creek because you'll have been furiously head down rowing the entire time, paying attention to whatever local problem you were solving or whatever piece of syntax or library trivia or compiler satisfaction game you were doing instead of the bigger picture.
Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on; but the problem with doing that without an agent is then it becomes boring. You've basically laid out a plan ahead of time and now you've just got to execute on the plan, which means (even though you might even fairly often revise the plan as you learn unknown unknowns or iterate on the design) that you've kind of sucked all the fun and discovery out of the code rights process. And it sort of means that you've essentially implemented the whole thing twice.
Meanwhile, with a coding agent, you can spend all the time you like building up that initial software design document, or specification, and then you can have it implement that. Basically, you can spend all the time in your hammock thinking through things and looking ahead, but then have that immediately directly translated into pull requests you can accept or iterate on instead of then having to do an intermediate step that repeats the effort of the hammock time.
Crucially, this specification or design document doesn't have to remain static. As you would discover problems or limitations or unknown unknowns, you can revise it and then keep executing on it, meaning it's a living documentation of your overall architecture and goals as they change. This means that you can really stay thinking about the high level instead of getting sucked into the low level. Coding agents also make it much easier to send something off to vibe out a prototype or explore the code base of a library or existing project in detail to figure out the feasibility of some idea, meaning that the parts that traditionally would have been a lot of effort to verify that what your planning makes sense have a much lower activation energy. so you're more likely to actually try things out in the process of building a spec
I believe programming languages are the better language for planning architecture, the algorithms, the domain model, etc... compared to English.
The way I develop mirrors the process of creating said design document. I start with a high level overview, define what Entities the program should represent, define their attributes, etc... only now I'm using a more specific language than English. By creating a class or a TS interface with some code documentation I can use my IDEs capabilities to discover connections between entities.
I can then give the code to an LLM to produce a technical document for managers or something. It'll be a throwaway document because such documents are rarely used for actual decision making.
> Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on;
I do this with code, and the IDE is much better than MS Word or whatevah at detecting my logical inconsistencies.
The problem is that you actually can't really model or describe a lot of the things that I do with my specifications using code without just ending up fully writing the low level code. Most languages don't have a type system that actually lets you describe the logic and desired behavior of various parts of the system and which functions should call which other functions and what your concurrency model is and so on without just writing the specific code that does it; in fact, I think the only languages that would allow you to do something like that would have to be like dependently typed languages or languages adjacent to formal methods. This is literally what the point of pseudocode and architecture graphs and so on are for.
Ah, perhaps. I understood it a little more broadly to include everything beyond pseudocode, rather than purely being able to use your fingers. You can solve a problem with pseudocode, and seasoned devs won't have much of an issue converting it to actual code, but it's not a fun process for everyone.
But this is exactly my point: if your "code" is different than your "pseudocode", something is wrong. There's a reason why people call Lisp "executable pseudocode", and it's because it shrinks the gap between the human-level description of what needs to happen and the text that is required to actually get there. (There will always be a gap, because no one understands the requirements perfectly. But at least it won't be exacerbated by irrelevant details.)
To me, reading the prompt example half a dozen levels up, reminds me of Greenspun's tenth rule:
> Any sufficiently complicated C++ program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. [1]
But now the "program" doesn't even have formal semantics and isn't a permanent artifact. It's like running a compiler and then throwing away the source program and only hand-editing the machine code when you don't like what it does. To me that seems crazy and misses many of the most important lessons from the last half-century.
the problem is that you actually have to implement that high level DSL to get Lisp to look like that, and most DSLs are not going to be able to be as concise and abstract as a natural language description of what you want, and then just making sure it resulted in the right thing — which then I'd want to use AI for, to write that initial boilerplate, from a high level description of what the DSL should do.
And a Lisp macro DSL is not going to help with automating refactors, automatically iterating to take care of small compiler issues or minor bugs without your involvement so you can focus on the overall goal, remembering or discovering specific library APIs or syntax, etc.
I think of it more like moving from sole developer to a small team lead. Which I have experienced in my career a few times.
I still write my code in all the places I care about, but I don’t get stuck on “looking up how to enable websockets when creating the listener before I even pass anything to hyper.”
I do not care to spend hours or days to know that API detail from personal pain, because it is hyper-specific, in both senses of hyper-specific.
(For posterity, it’s `with_upgrades`… thanks chatgpt circa 12 months ago!)
I get my dopamine from solving problems, not trying to figure out why that damn API is returning the wrong type of field for three hours. Claude will find it out in minutes - while I do something else. Or from writing 40 slightly different unit tests to cover all the edge cases for said feature.
> it's time to sit back and figure out why there's a mismatch with the problem domain and come back at it from another direction
But this is exactly what LLMs help me with! If I decide I want to shift the abstractions I'm using in a codebase in a big way, I'd usually be discouraged by all the error, lint, and warning chasing I'd need to do to update everything else; with agents I can write the new code (or describe it and have it write it) and then have it set off and update everything else to align: a task that is just varied and context specific enough that refactoring tools wouldn't work, but is repetitive and time consuming enough that it makes sense to pass off to a machine.
The thing is that it's not necessarily a bottleneck in terms of absolute speed (I know my editor well and I'm a fast typist, and LLMs are in their dialup era) but it is a bottleneck in terms of motivation, when some refactor or change in algorithm I want to make requires a lot of changes all over a codebase, that are boring to make but not quite rote enough to handle with sed or IDE refactoring. It really isn't, for me, even mostly about the inconvenience of typing out the initial code. It's about the inconvenience of trying to munge text from one state to another, or handle big refactors that require a lot of little mostly rote changes in a lot of places; but it's also about dealing with APIs or libraries where I don't want to have to constantly remind myself what functions to use, what to pass as arguments, what config data I need to construct to pass in, etc, or spend hours trawling through docs to figure out how to do something with a library when I can just feed its source code directly to an LLM and have it figure it out. There's a lot of friction and snags to writing code beyond typing that has nothing to do with having come up with a wrong abstraction, that very often lead to me missing the forest for the trees when I'm in the weeds.
Also, there is ALWAYS boilerplate scaffolding to do, even with the most macrotastic Lisp; and let's be real: Lisp macros have their own severe downsides in return for eliminating boilerplate, and Lisp itself is not really the best language (in terms of ecosystem, toolchain, runtime, performance) for many or most tasks someone like me might want to do, and languages adapted to the runtime and performance constraints of their domain may be more verbose.
Which means that, yes, we're using languages that have more boilerplate and scaffolding to do than strictly ideally necessary, which is part of why we like LLMs, but that's just the thing: LLMs give you the boilerplate eliminating benefits of Lisp without having to give up the massive benefits in other areas of whatever other language you wanted to use, and without having to write and debug macro soup and deal with private languages.
There's also how staying out of the code writing oar wells changes how you think about code as well:
If you think through a problem as you're writing the code for it, you're going to end up the wrong creek because you'll have been furiously head down rowing the entire time, paying attention to whatever local problem you were solving or whatever piece of syntax or library trivia or compiler satisfaction game you were doing instead of the bigger picture.
Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on; but the problem with doing that without an agent is then it becomes boring. You've basically laid out a plan ahead of time and now you've just got to execute on the plan, which means (even though you might even fairly often revise the plan as you learn unknown unknowns or iterate on the design) that you've kind of sucked all the fun and discovery out of the code rights process. And it sort of means that you've essentially implemented the whole thing twice.
Meanwhile, with a coding agent, you can spend all the time you like building up that initial software design document, or specification, and then you can have it implement that. Basically, you can spend all the time in your hammock thinking through things and looking ahead, but then have that immediately directly translated into pull requests you can accept or iterate on instead of then having to do an intermediate step that repeats the effort of the hammock tim
The more accurate prompt would be “You are a mind reader. Create me a plan to create a task manager, define the requirements, deploy it, and tell me when it’s done.”
And then you just rm -rf and repeat until something half works.
"Here are login details to my hosting and billing provider. Create me a SaaS app where customers could rent virtual pets. Ensure it's AI and blockchain and looks inviting and employ addictive UX. I've attached company details for T&C and stuff. Ensure I start earning serious money by next week. I'll bump my subscription then if you deliver, and if not I will delete my account. Go!"
I haven't tried it, but someone at work suggested using voice input for this because it's so much easier to add details and constraints. I can certainly believe it, but I hate voice interfaces, especially if I'm in an open space setting.
You don't even have to be as organised as in the example, LLMs are pretty good at making something out of ramblings.
This is a good start. I write prompts as if I was instructing junior developer to do stuff I need. I make it as detailed and clear as I can.
I actually don't like _writing_ code, but enjoy reading it. So sessions with LLM are very entertaining, especially when I want to push boundaries (I am not liking this, the code seems a little bit bloated. I am sure you could simplify X and Y. Also think of any alternative way that you reckon will be more performant that maybe I don't know about). Etc.
This doesn't save me time, but makes work so much more enjoyable.
> I actually don't like _writing_ code, but enjoy reading it.
I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly. I don't like moving text around; munging control structures from one shape to another; I don't like the busy work of copying and pasting code that isn't worth DRYing, or isn't capable of being DRY'd effectively; I hate going around and fixing all the little compiler and linter errors produced by a refactor manually; and I really hate the process of filling out the skeleton of an type/class/whatever architecture in a new file before getting to the meat.
However, reading code is pretty easy for me, and I'm very good at quickly putting algorithms and architectures I have in my head into words — and, to be honest, I often find this clarifies the high level idea more than writing the code for it, because I don't get lost in the forest — and I also really enjoy taking something that isn't quite good enough, that's maybe 80% of the way there, and doing the careful polishing and refactoring necessary to get it to 100%.
I don't want to be "that guy", but I'll indulge myself.
> I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly.
I feel the same way (to at least some extent) about every language I've used other than Lisp. Lisp + Paredit in Emacs is the most pleasant code-wrangling experience I've ever had, because rather having to think in terms of characters or words, I'm able to think in terms of expressions. This is possible with other languages thanks to technologies like Tree-sitter, but I've found that it's only possible to do reliably in Lisp. When I do it in any other language I don't have an unshakable confidence that the wrangling commands will do exactly what I intend.
When I code, I mostly go by two perspectives: The software as a process and the code as a communication medium.
With the software as a process, I'm mostly thinking about the semantics of each expressions. Either there's a final output (transient, but important) or there's a mutation to some state. So the code I'm writing is for making either one possible and the process is very pleasing, like building a lego. The symbols are the bricks and other items which I'm using to create things that does what I want.
With the code as communication, I mostly take the above and make it readable. Like organizing files, renaming variables and functions, modularising pieces of code. The intent is for other people (including future me) to be able to understand and modify what I created in the easiest way possible.
So the first is me communicating with the machine, the second is me communicating with the humans. The first is very easy, you only need to know the semantics of the building blocks of the machine. The second is where the craft comes in.
Emacs (also Vim) makes both easy. Code has a very rigid structure and both have tools that let you manipulate these structure either for adding new actions or refine the shape for understanding.
With AI, it feels like painting with a brick. Or transmitting critical information through a telephone game. Control and Intent are lost.
Yes! Don't worry about it, I very much agree. However, I do think that even if/when I'm using Lisp and have all the best structural editing capabilities at my disposal, I'd still prefer to have an agent do my editing for me; I'd just be 30% more likely to jump in and write code myself on occasion — because ultimately, even with structural editing, you're still thinking about how to apply this constrained set of operations to manipulate a tree of code to get it to where you want, and then having to go through the grunt work of actually doing that, instead of thinking about what state you want the code to be in directly.
Vehement agreeing below:
S-expressions are a massive boon for text editing, because they allow such incredible structural transformations and motions. The problem is that, personally, I don't actually find Lisp to be the best tool for the job for any of the things I want to do. While I find Common Lisp and to a lesser degree Scheme to be fascinating languages, the state of the library ecosystem, documentation, toolchain, and IDEs around them just aren't satisfactory to me, and they don't seem really well adapted to the things I want to do. And yeah, I could spend my time optimizing Common Lisp with `declare`s and doing C-FFI with it, massaging it to do what I want, that's not what I want to spend my time doing. I want to actually finish writing tools that are useful to me.
Moreover, while I used to have hope for tree-sitter to provide a similar level of structural editing for other languages, at least in most editors I've just not found that to be the case. There seem really to be two ways to use tree-sitter to add structural editing to languages: one, to write custom queries for every language, in order to get Vim style syntax objects, and two, to try to directly move/select/manipulate all nodes in the concrete syntax tree as if they're the same, essentially trying to treat tree-sitter's CSTs like S-expressions.
The problem with the first approach is that you end up with really limited, often buggy or incomplete, language support, and structural editing that requires a lot more cognitive overhead: instead of navigating a tree fluidly, you're having to "think before you act," deciding ahead of time what the specific name, in this language, is for the part of the tree you want to manipulate. Additionally, this approach makes it much more difficult to do more high level, interesting transformations; even simple ones like slurp and barf become a bit problematic when you're dealing with such a typed tree, and more advanced ones like convolute? Forget about it.
The problem with the second approach is that, if you're trying to do generalized tree navigation, where you're not up-front naming the specific thing you're talking about, but instead navigating the concrete syntax tree as if it's S-expressions, you run into the problem the author of Combobulate and Mastering Emacs talks about[1]: CSTs are actually really different from S-expressions in practice, because they don't map uniquely onto source code text; instead, they're something overlaid on top of the source code text, which is not one to one with it (in terms of CST nodes to text token), but many to one, because the CST is very granular. Which means that there's a lot of ambiguity in trying to understand where the user is in the tree, where they think they are, and where they intend to go.
There's also the fact that tree-sitter CSTs contain a lot of unnamed nodes (what I call "stop tokens"), where the delimiters for a node of a tree and its children are themselves children of that node, siblings with the actual siblings. And to add insult to injury, most language syntaces just... don't really lend themselves to tree navigation and transformation very well.
I actually tried to bring structural editing to a level equivalent to the S-exp commands in Emacs recently[2], but ran into all of the above problems. I recently moved to Zed, and while its implementation of structural editing and movement is better than mine, and pretty close to 1:1 with the commands available in Emacs (especially if they accept my PR[3]), and also takes the second, language-agnostic, route, it's still not as intuitive and reliable as I'd like.
This is similar to how I prompt, except I start with a text file and design the solution and paste it in to an LLM after I have read it a few times. Otherwise, if I type directly in to the LLM and make a mistake it tends to come back and haunt me later.
I think it’s usage patterns. It is you in a sense.
You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.
I think you and other deniers try one prompt and then they see the issues and stop.
Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.
> Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want.
Who are you people who spend so much time writing code that this is a significant productivity boost?
I'm imagining doing this with an actual child and how long it would take for me to get a real return on investment at my job. Nevermind that the limited amount of time I get to spend writing code is probably the highlight of my job and I'd be effectively replacing that with more code reviews.
I recently inherited an over decade old web project full of EOL'd libraries and OS packages that desperately needed to be modernized.
Within 3 hours I had a working test suite with 80% code coverage on core business functionality (~300 tests). Now - maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours, but I know empirically that they cover a majority of the code of the core logic. We can now incrementally upgrade the project and have at least some kind of basic check along the way.
There's no way I could have pieced together as large of a working test suite using tech of that era in even double that time.
> maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours,
If you haven't reviewed and signed off then you have to assume that the stuff is garbage.
This is the crux of using AI to create anything and it has been a core rule of development for many years that you don't use wizards unless you understand what they are doing.
I used a static analysis code coverage tool to guarantee it was checking the logic, but I did not verify the logic checking myself. The biggest risk is that I have no way of knowing that I codified actual bugs with tests, but if that's true those bugs were already there anyways.
I'd say for what I'm trying to do - which is upgrade a very old version of PHP to something that is supported, this is completely acceptable. These are basically acting as smoke tests.
You need to be a bit careful here. A test that runs your function and then asserts something useless like 'typeof response == object' will also meet those code coverage numbers.
In reality, modern LLMs write tests that are more meaningful than that, but it's still worth testing the assumption and thinking up your own edge cases.
I code firmware for a heavily regulated medical device (where mistakes mean life and death), and I try to have AI write unit tests for me all the time, and I would say I spend about 3 days correcting and polishing what the AI gives me in 30 minutes. The first pass the AI gives me, likely saves a day of work, but you would have to be crazy to trust it blindly. I guarantee it is not giving you what you think it is or what you need. And writing the tests is when I usually find and fix issues in the code. If AI is writing tests that all pass without updating the code then it's likely falsely telling you the code is perfect when it isn't.
If you're using a code coverage tool to identify the branches its hitting in the code, you at least have a guarantee that it is testing the code its writing tests for as long as you check the assertions. I could be codifying bugs with tests and probably did (but they were already there anyways). For the purpose of upgrading OS libraries and surrounding software, this is a good approach - I can incrementally upgrade the software, run all the tests, and see if anything falls over.
I'm not having AI write tests for life-or-death software nor did I claim that AI wrote tests that all pass without updating any code.
You know they cause a majority of the code of the core logic to execute, right? Are you sure the tests actually check that those bits of logic are doing the right thing? I've had Claude et al. write me plenty of tests that exercise things and then explicitly swallow errors and pass.
Yes, the first hour or so was spent fidgeting with test creation. It started out doing it's usual whacky behavior like checking the existence of a method and calling that a "pass", creating a mock object that mocked the return result of the logic it was supposed to be testing, and (my favorite) copying the logic out of the code and putting it directly into the test. Lots of course correction, but once I had one well written test that I had fully proofed myself I just provided it that test as an example and it did a pretty good job following those patterns for the remainder.
I still sniffed out all the output for LLM whackiness though. Using a code coverage tool also helps a lot.
... Yeah thise tests are probably garbage. The models probably covered the 80% that consists of boiler plate and mocked out the important 20% that was critical business logic. That's how it was in my experience.
And maybe child is too simplistic of an analogy. It's more like working with a savant.
The type of thing you can tell AI to do is like this: You tell it to code a website... it does it, but you don't like the pattern.
Say, "use functional programming", "use camel-case" don't use this pattern, don't use that. And then it does it. You can leave it in the agent file and those instructions become burned into it forever.
A better way to put it is with this example: I put my symptoms into ChatGPT and it gives some generic info with a massive "not-medical-advice" boilerplate and refuses to give specific recommendations. My wife (an NP) puts in anonymous medical questions and gets highly specific med terminology heavy guidance.
That's all to say the learning curve with LLMs is how to say things a specific way to reliability get an outcome.
These people are just the same charlatans and scammers you saw in the web3 sphere. Invoking Ryan Dahl as some sort of authority figure and not a tragic figure that sold his soul to VC companies is even more pathetic.
My personal suspicion is that the detractors value process and implementation details much more highly than results. That would not surprise me if you come from a business that is paid for its labor inputs and is focused on keeping a large team billable for as long as possible. But I think hackers and garage coders see the value of “vibing” as they are more likely to be the type of people who just want results and view all effort as margin erosion rather than the goal unto itself.
The only thing I would change about what you said is, I don’t see it as a child that needs tutoring. It feels like I’m outsourcing development to an offshore consultancy where we have no common understanding, except the literal meaning of words. I find that there are very, very many problems that are suited well enough to this arrangement.
My 2c: there is a divide, unacknowledged, between developers that care about "code correctness" (or any other quality/science/whatever adjective you like) and those who care about the whole system they are creating.
I care about making stuff. "Making stuff" means stuff that I can use. I care about code quality yes, but not to an obsessive degree of "I hate my framework's ORM because of <obscure reason nobody cares about>". So, vibe coding is great, because I know enough to guide the agent away from issues or describe how I want the code to look or be changed.
This gets me to my desired effect of "making stuff" much faster, which is why I like it.
My other 2c: There are Engineers who are concerned by the long-term consequences of their work e.g. maintainability.
In real engineering disciplines, the Engineer is accountable for their work. If a bridge you signed off collapses, you're accountable and if it turns out you were negligent you'll face jail time. In Software, that might be a program in a car.
The Engineering mindset embodies these principles regardless of regulatory constraints. The Engineer needs to keep in mind those who'll be using their constructions. With Agentic Vibecoding, I can never get confident that the resulting software will behave according to specs. I'm worried that it'll scewover the user, the client, and all stakeholders. I can't accept half-assed work just because it saved me 2 days of typing.
I don't make stuff just for the sake of making stuff otherwise it would just be a hobby, and in my hobbies I don't need to care about anything, but I can't in good conscience push shit and slop down other people's throats.
The industry cares about reasonable results not perfection.
If vibe coding delivers in one day, + an additional 2 days to solve stupid bugs, what you deliver with utter perfection in 3 months, then the industry doesn't give a shit about slop.
Is it maintainable? Well it's AI that's going to maintain it.
I think the future will turn into one where source code is like assembly code. Do you care about how your automated compiler system is spitting out assembly? Is the assembly code, neat and organized and maintainable? No. You don't care about assembly code. The industry is shifting in the direction where they don't care about ALL source code.
> Is it maintainable? Well it's AI that's going to maintain it.
That's what's currently not possible, it might work in a small webapp or similar.
But in a large system, it absolutely falls apart when having to maintain it.
Sure, it can fix a bug, but it doesn't understand the side effects it creates with the fix, yet.
Maybe in the future that will also be possible. I do agree with you about business/management not caring about long term impacts if short term gains are possible.
In real Engineering disciplines the process is important, and is critical for achieving desired results, that's why there are manuals and guidelines measured in the hundreds of pages for things like driving a pile into dirt. There are rigorous testing procedures to enusre everything is correct and up to spec, because there are real consequences.
Software Developers have long been completely disconnected from the consequences of their work, and tech companies have diluted responsibility so much that working software doesn't matter anymore. This field is now mostly scams and bullshit, where developers are closer to finance bros than real, actual Engineers.
I'm not talking about what someone os building in their home for personal reasons for their own usage, but about giving the same thing to other people.
Nah, I'm with you there. I've yet to see even Opus 4.5 produce something close to production-ready -- in fact Opus seems like quite a major defect factory, given its consistent tendency toward hardcoding case by case workarounds for issues caused by its own bad design choices.
I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.
Working code doesn’t mean the same for everyone. My coworker just started vibe coding. Her code works… on happy paths. It absolutely doesn’t work when any kind of error happens. It’s also absolutely impossible to refactor it in any way. She thinks her code works.
The same coworker asked to update a service to Spring Boot 4. She made a blog post about. She used LLM for it. So far every point which I read was a lie, and her workarounds make, for example tests, unnecessarily less readable.
So yeah, “it works”, until it doesn’t, and when it hits you, that you need to work more in sum at the end, because there are more obscure bugs, and fixing those are more difficult because of terrible readability.
I can't help but think of my earliest days of coding, 20ish years ago, when I would post my code online looking for help on a small thing, and being told that my code is garbage and doesn't work at all even if it actually is working.
There are many ways to skin a cat, and in programming the happens-in-a-digital-space aspect removes seemingly all boundaries, leading to fractal ways to "skin a cat".
A lot of programmers have hard heads and know the right way to do something. These are the same guys who criticized every other senior dev as being a bad/weak coder long before LLMs were around.
Parent's profile shows that they are an experienced software engineer in multiple areas of software development.
Your own profile says you are a PM whose software skills amount to "Script kiddie at best but love hacking things together."
It seems like the "separate worlds" you are describing is the impression of reviewing the code base from a seasoned engineer vs an amateur. It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.
At least in my experience, learning to quickly read a code base is one of the later skills a software engineer develops. Generally only very experienced engineers can dive into an open source code base to answer questions about how the library works and is used (typically, most engineers need documentation to aid them in this process).
I mean, I've dabbled in home plumbing quite a bit, but if AI instructed me to repair my pipes and I thought it "looked great!" but an experienced plumber's response was "ugh, this doesn't look good to me, lots of issues here" I wouldn't argue there are "two separate worlds".
> It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.
This really is it: AI produces bad to mediocre code. To someone who produces terrible code mediocre is an upgrade, but to someone who produces good to excellent code, mediocre is a downgrade.
Today. It produces mediocre code today. That is really it. What is the quality of that code compared to 1 year ago. What will it be in 1 year? Opus 6.5 is inevitable.
That's what they've been saying for years now. Seems like the same FSD marketing. Any day now it'll be driving across the country! Just you wait! -> Any day now it'll be replacing software developers! Just you wait! Frankly, the same people who fell for the former are falling for the latter.
Rather, to me it looks like all we're getting with additional time is marginal returns. What'll it be in 1 year? Marginally better than today, just like today is marginally better compared to a year ago. The exponential gains in performance are already over. What we're looking at now is exponentially more work for linear gains in performance.
You think it'll rapidly get smarter, but it just recreates things from all the terrible code it was fed.
Code and how it is written also rapidly changes these days and LLMs have some trouble drawing lines between versions of things and the changes within them.
Sure, they can compile and test things now, which might make the code work and able to run. The quality of it will be hard to increase without manually controlling and limiting the type of code it 'learns' from.
Except I work with extremely competent software engineers on software used in mission critical applications in the Fortune 500. I call myself a script kiddie because I did not study Computer Science. Am I green in the test run? Does it pass load tests? Is it making money? While some of yall are worried about leaky abstractions, we just closed another client. Two worlds for sure where one team is skating to the puck, looking to raise cattle while another wants to continue nurturing an exotic pet.
Plenty of respect to the craft of code but the AI of today is the worst is is ever going to be.
Can you just clarify the claim you're making here: you personally are shipping vibe coded features, as a PM, that makes it into prod and this prod feature that you're building is largely vibe coded?
It depends heavily on the scope and type of problem. If you're putting together a standard isolated TypeScript app from scratch it can do wonders, but many large systems are spread between multiple services, use abstractions unique to the project, and are generally dealing with far stricter requirements. I couldn't depend on Claude to do some of the stuff I'd really want, like refactor the shared code between six massive files without breaking tests. The space I can still have it work productively in is still fairly limited.
That's a significant rub with LLMs, particularly hosted ones: the variability. Add in quantization, speculative decoding, and dynamic adjustment of temperature, nucleus sampling, attention head count, & skipped layers at runtime, and you can get wildly different behaviors with even the same prompt and context sent to the same model endpoint a couple hours apart.
That's all before you even get to all of the other quirks with LLMs.
I've found that the thing that made is really click for me was having reusable rules (each agent accepts these differently) that help tell it patterns and structure you want.
I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.
Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.
That has made writing LLM code so much more enjoyable for me.
Its really becoming a good litmus test for how someones coding ability whether they think LLMS can do well on complex tasks.
For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.
It’s not. There are tons of great programmers, that are big names in the industry who now exclusively vibe code. Many of these names are obviously intelligent and great programmers.
People use "vibe coding" to mean different things - some mean the original Karpathy "look ma, no hands!", feel the vibez, thing, and some just (confusingly) use "vibe coding" to refer to any use of AI to write code, including treating it as a tool to write small well-defined parts that you have specified, as opposed to treating it as a magic genie.
There also seem to be people hearing big names like Karpathy and Linus Torvalds say they are vibe coding on their hobby projects, meaning who knows what, and misunderstanding this as being an endorsement of "magic genie" creation of professional quality software.
Results of course also vary according to how well what you are asking the AI to do matches what it was trained on. Despite sometimes feeling like it, it is not a magic genie - it is a predictor that is essentially trying to best match your input prompt (maybe a program specification) to pieces of what it was trained on. If there is no good match, then it'll have a go anyway, and this is where things tend to fall apart.
Funny, the last interview I watched with Karpathy he highlighted the way the AI/LLM was unable to think in a way that aligned with his codebase. He described vibe-coding a transition from Python to Rust but specifically called out that he hand-coded all of the python code due to weaknesses in LLM's ability to handle performant code. I'm pretty sure this was the last Dwarkesh interview with "LLMs as ghosts".
Right, and he also very recently said that he felt essentially left behind by AI coding advances, thinking that his productivity could be 10x if he knew how to use it better.
It seems clear that Karpathy himself is well aware of the difference between "vibe coding" as he defined it (which he explicitly said was for playing with on hobby projects), and more controlled productive use of AI for coding, which has either eluded him, or maybe his expectations are too high and (although it would be surprising) he has not realized the difference between the types of application where people are finding it useful, and use cases like his own that do not play to its strength.
I don't think he meant to start a movement - it was more of a throw-away tweet that people took way too seriously, although maybe with his bully pulpit he should have realized that would happen.
They are more effective then on the ground in your face evidence largely because people who are so against AI are blind to it.
I hold a result of AI in front of your face and they still proclaim it’s garbage and everything else is fraudulent.
Let’s be clear. You’re arguing against a fantasy. Nobody even proponents of AI claims that AI is as good as humans. Nowhere near it. But they are good enough for pair programming. That is indisputable. Yet we have tons of people like you who stare at reality and deny it and call it fraudulent.
Examine the lay of the land if that many people are so divided it really means both perspectives are correct in a way.
Just to be more pedantic, there is more nuance to all of that.
Nobody smart is going to disagree that LLMs are a huge net positive. The finer argument is whether or not at this point you can just hand off coding to an LLM. People who say yes simply just haven't had enough experience with using LLMs to a large extent. The amount of time you have to spend prompt engineering the correct response is often the same amount of time it takes for you to write the correct code yourself.
And yes, you can put together AGENT.md files, mcp servers, and so on, but then it becomes a game of this. https://xkcd.com/1205/
If you want to be any good at all in this industry, you have to develop enough technical skills to evaluate claims for yourself. You have to. It's essential.
Because the dirty secret is a lot of successful people aren't actually smart or talented, they just got lucky. Or they aren't successful at all, they're just good at pretending they are, either through taking credit for other people's work or flat out lying.
I've run into more than a few startups that are just flat out lying about their capabilities and several that were outright fraud. (See DoNotPay for a recent fraud example lol)
Pointing to anyone and going "well THEY do it, it MUST work" is frankly engineering malpractice. It might work. But unless you have the chops to verify it for yourself, you're just asking to be conned.
Steve Yegge (Veteran engineer, formerly Google and Amazon): A leading technical voice who describes vibe coding as acting as an orchestrator. He maintains that engineers who do not master "agentic engineering" and AI-driven workflows will be left behind as the industry moves toward "hyperproductivity".
Patrick Debois (Founder of DevOps): Often called the "godfather of DevOps," Debois now advocates for the "AI native developer". He views vibe coding as a high-level abstraction where the engineer's role shifts from a "producer" of lines of code to a "supervisor" of complex automated systems.
Simon Willison (Co-creator of Django): Recognized for his highly technical workflows that use AI to handle mechanical implementation while he focuses on rigorous documentation, tool coverage, and validation—a process often cited as the professional gold standard for vibe coding.
Stephen Blum (Founder/CTO of PubNub): A technical leader who has integrated generative coding into production-scale architecture. He characterizes the 2026 developer's role as directing agents for everything from database migrations to security audits rather than manually performing these tasks.
Gene Kim (Renowned DevOps researcher and author): Co-author of The Phoenix Project, Kim has publicly championed vibe coding as one of the most enjoyable technical experiences of his career, citing how it allows him to build sophisticated prototypes in minutes rather than days.
Patrick Debois (Founder of DevOps): Often referred to as the "godfather of DevOps," Debois advocates for "AI-native engineering". He views vibe coding as a mature abstraction layer where elite engineers focus on "system orchestration" rather than producing individual lines of code.
Geoffrey Huntley (Founder of the "Vibe Coding Academy"): A highly technical engineer known for pushing the boundaries of AI-driven development. He is a primary source for experimental techniques that use agents for everything from infrastructure to core logic.
Boris Cherny (Author of Programming TypeScript): An authority on type systems and engineering rigor, Cherny now provides deep technical guidance on how to integrate high-level intent with reliable, production-ready source code using tools like Claude Code.
Stephen Webb (UK CTO at Capgemini): A key industry figure declaring 2026 as the year "AI-native engineering goes mainstream". He supports vibe coding as a legitimate method for rewriting legacy systems and refactoring entire modules autonomously.
Linus Torvalds (Creator of Linux and Git): In a significant endorsement for the paradigm, Torvalds reported in early 2026 that he used Google Antigravity to vibe code a Python visualizer for his AudioNoise project. He noted in the project's documentation that the tool was "basically written by vibe-coding".
Theo Browne (Founder of Ping.gg, T3.gg): Known for his deep technical influence on the web development community, Browne is a primary educator for tools like Claude Code. He advocates for vibe coding as a way to bypass the "boring parts" of development, allowing engineers to focus on higher-level architecture and product logic.
McKay Wrigley (Developer and AI educator): A leading technical figure focused on structured tutorials and advanced workflows for agentic programming. He is widely followed by senior engineers seeking to move beyond simple chat interfaces into full-scale autonomous software generation.
Charlie Holtz (Software engineer and infrastructure specialist): Known for building advanced infrastructure tools, Holtz is recognized as an engineer "pushing the boundaries" of what can be built using vibe coding for complex, back-end systems.
Cian Clarke (Principal Engineer at NearForm): A veteran in the Node.js ecosystem who has transitioned toward spec-driven development. He advocates for "AI native engineering" where specialized agentic roles (such as security or performance agents) are orchestrated to build and refactor large-scale enterprise systems.
IndyDevDan (Senior developer and educator): A highly technical voice advocating for "deep mastery" of AI-assisted engineering. He focuses on teaching developers how to maintain rigorous engineering standards while leveraging the speed of vibe coding.
Mitchell Hashimoto (Founder of HashiCorp, Creator of Terraform): Now focused on his terminal project Ghostty, Hashimoto has become a leading voice on "pragmatic AI coding." In 2026, he detailed his workflow of using reasoning models (like o3) to generate comprehensive architecture plans before writing a single line of code. He argues this "learning accelerator" approach allows him to build outside his primary expertise (e.g., frontend) while maintaining strict engineering rigor by reviewing the output line-by-line.
Kent C. Dodds (Renowned Web Development Educator & Engineer): A highly influential figure in the React community, Dodds has fully embraced the paradigm, stating in 2026 that he has "never had so much fun developing software." He advocates for a "problem elimination" mindset where AI handles the implementation details, allowing senior engineers to focus entirely on user experience and application architecture.
Guillermo Rauch (CEO of Vercel, Creator of Next.js/Socket.io): Rauch has been a vocal proponent of vibe coding as the bridge between business logic and shipping software. He argues that vibe coding solves the "execution gap," enabling technical founders and engineers to ship complex products without getting bogged down in boilerplate, effectively treating the AI as a "junior engineer with infinite stamina" that requires high-level direction.
DHH (David Heinemeier Hansson) (Creator of Ruby on Rails & CTO at 37signals): Historically a skeptic of industry hype, DHH has acknowledged the "tipping point" in 2026, noting that agentic coding has become a viable tool for experienced developers to deliver on specs rapidly. His shift represents a major endorsement from the "craftsman" sector of the industry, validating that AI tools can coexist with high standards for code quality.
Rich Harris (Creator of Svelte): Harris has spoken about how AI-driven workflows liberate developers from "code preferences" and syntax debates. He views the 2026 landscape as one where the engineer's job is to focus on the "what" and "why" of a product, while AI increasingly handles the "how," allowing for a renaissance in creativity and shipping speed.
Addy Osmani (Engineering Manager for Chrome Web Platform): While deeply embedded in the browser ecosystem, Osmani has published extensively on his "AI-augmented" workflow in 2026. He characterizes the modern senior engineer not as a typist but as a "Director," whose primary skill is effectively guiding AI agents to execute complex engineering tasks while maintaining architectural integrity.
The above is just a smattering of individuals. I can keep going.
>If you want to be any good at all in this industry, you have to develop enough technical skills to evaluate claims for yourself. You have to. It's essential.
This is an orthogonal off topic point. My or anyones skills don't have to do with the topic at hand. The topic at hand is AI.
>Because the dirty secret is a lot of successful people aren't actually smart or talented, they just got lucky. Or they aren't successful at all, they're just good at pretending they are, either through taking credit for other people's work or flat out lying.
Again orthoganol to the point. But I'll entertain it. There's another class of people who are delusional. They think they're good, but they're not good at all. I've seen plenty of that in the industry. More so then people who lie, it's people who lie to themselves and believe it. Engineers so confident in their skills, but when I look at them I think they're raw dog shit.
>I've run into more than a few startups that are just flat out lying about their capabilities and several that were outright fraud. (See DoNotPay for a recent fraud example lol)
Again so?
>Pointing to anyone and going "well THEY do it, it MUST work" is frankly engineering malpractice. It might work. But unless you have the chops to verify it for yourself, you're just asking to be conned.
Of course. But it's idiotic when there is a huge population of people who are smarter than you better than you and proven to be more capable than you saying they can do it. I need to emphasize it's not just one person saying it. Tons and tons and tons of people are saying it.
Fraud happens in the margins of society it rarely ever happens at a macro level, and if it does happen at a macro level the trend doesn't last long and will mostly die within a year at most.
So when multitudes of highly reputed people are saying one thing, and your on the ground self verification of that thing is directly opposite of what they are saying. Then you need to re-evaluate your OWN verification. You need to investigate WHY there is a discrepancy, because it is utter stupidity to deny what others have seen as fraud and believe that your own judgements and verifications are flawless.
No offense, my dude, but your philosophy on this topic embodies the delusional stupidity I am talking about. People lie to themselves. That is the key metric here.
I don't need to explain ANY of this to you. You know it, because every explanation I just gave is an OBVIOUS facet of life in general. It needs to be explained to someone like you despite it's obviousness because of self delusion.
> Fraud happens in the margins of society it rarely ever happens at a macro level, and if it does happen at a macro level the trend doesn't last long and will mostly die within a year at most
Ahahahahahaha. Oh man. I think you have some reallll hard lessons in front of you about the nature of industries that have lots and lots of money being thrown at them.
I have been a part of this industry for 10+ years at this point, at companies you have heard of. There is a lot -- I mean a lot -- of people who will do and say anything if they think it'll get them something.
Yes, that includes people who have pedigrees. Yes, that includes people with all the traits you mention. It's the nature of being in an industry where money gets thrown around in buckets.
You don't have to be a cynic about people, you don't have to be paranoid, it doesn't have to poison your outlook on life. I work with lots of smart great folk and I don't walk around eying my coworkers suspiciously. You do need to be street smart.
If the start and end of your critical thinking is "well this person said so"? That's not critical thinking, the polite word for that is starchasing. If you don't or can't develop the technical chops to evaluate claims for yourself, you'll never get out of that trap.
I’m talking about actual public fraudulent lies. These are weeded out quickly. Think flat earth.
> I have been a part of this industry for 10+ years at this point, at companies you have heard of.
I’ve been at it longer. And at companies where you use the products everyday.
> I mean a lot -- of people who will do and say anything if they think it'll get them something.
You’re not that bright are you? Of course they will. I’m talking about public fraud. Like the flat earth movement. These things don’t last long. I’m not talking about human nature and people predilection for lying.
Your brain is somehow fixated into thinking your some 10 year veteran (oooooh your so great) who’s seen it all and you’re talking to a greenhorn when really you’re just not smart enough to understand what’s being said. Bro wake up. You missed the point and went off on a tangent.
> You do need to be street smart.
This is next level. Let me spell it out for you: you’re not street smart. You’re not smart. You don’t look at things critically you don’t self examine your own judgements. You just approach everything with a sort of cocky confidence and you get shit wrong. Constantly. You “clocked” me in completely wrong, your comments all over HN are wildly and factually off base.
I think the author is way understating the uselessness of LLMs in any serious context outside of a demo to an investor. I've had nothing but low IQ nonsense from every SOTA model.
If we're being honest with ourselves, Opus 4.5 / GPT 5.2 etc are maybe 10-20% better than GPT 3.5 at most. It's a total and absolute catastrophic failure that will go down in history as one of humanity's biggest mistakes.
You don't have to be bad at coding to use LLMs. The argument was specifically about thinking that LLMS can be great at accomplishing complex tasks (which they are not)
And my point is that what you think are complex tasks are not really complex.
The simple case is that if you ask an agent to do a whole bunch of modifications across a large number of files, it often loses context due to context windows.
Now, you can make your own agents with custom mcp servers to basically improve its ability to do tasks, but then you are basically just building automation tools in the first place.
See my other response. I didn't define what a complex task is. I used people with reputation, intelligence and ability greater then myself to say, if they endorce it, then they must be using it on complex tasks and it must be successful to them.
I can certainly see how you're better than every one of those people and how to you what they call "complex" is just simplistic. I've never met anyone great like you.
Great programmers wouldn't support or back AI if it couldn't handle complex tasks. AI can handle complex tasks inconsistently when operating on it's own. They can handle complex tasks consistently when pair programming with a human operator.
The secret sauce for me is Beads. Once Beads is setup you make the tasks and refine them and by the end each task is a very detailed prompt. I have Claude ask me clarifying questions, do research for best practices etc
Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.
I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.
I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.
You're not taking crazy pills, this is my exact experience too. I've been using my wife's eCommerce shop (a headless Medusa instance, which has pretty good docs and even their own documentation LLM) as a 100% vibe-coded project using Claude Code, and it has been one comedy of errors after another. I can't tell you how many times I've had it go through the loop of Cart + Payment Collection link is broken -> Redeploy -> Webhook is broken (can't find payment collection) -> Redeploy -> Cart + Payment Collection link is broken -> Repeat. And it never seems to remember the reasons it had done something previously – despite it being plastered 8000 times across the CLAUDE.md file – so it bumbles into the same fuckups over and over again.
A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.
The other side of this coin are the non-developer stakeholders who Dunning-Kruger themselves into firm conclusions on technical subjects with LLMs. "Well I can code this up in an hour, two max. Why is it taking you ten hours?". I've (anecdotally) even had project sponsors approach me with an LLM's judgement on their working relationship with me as if it were gospel like "It said that we aren't on the same page. We need to get aligned." It gets weird.
These cases are common enough to where it's more systemic than isolated.
I read these comments and articles and feel like I am completely disconnected from most people here. Why not use GenAI the way it actually works best: like autocomplete on steroids. You stay the architect, and you have it write code function by function. Don't show up in Claude Code or Codex asking it to "please write me GTA 6 with no mistakes or you go to jail, please."
It feels like a lot of people are using GenAI wrong.
Chances are you're asking it for things more insteresting than some domains hello world example. Your experience has been mine as well. AI simply cant do anything other than the basics, even if you hold it's hand. So its only use-case is a junior dev for senior devs who cant afford junior devs.
I am getting workable code with Claude on a 10kloc Typescript project. I ask it to make plans then execute them step by step. I have yet to try something larger, or something more obscure.
I feel like there is a nuance here. I use GitHub Copilot and Claude Code, and unless I tell it to not do anything, or explicitly enable a plan mode, the LLM will usually jump straight to file edits. This happens even if I prompt it with something as simple as "Remind me how loop variable scoping works in this language?".
This. I feel like folks are living in two separate worlds. You need to narrow the aperture and take the LLm through discrete steps. Are people just saying it doesn't work because they are pointing it at 1m loc monoliths and trying to oneshot a giant epic?
its all fake coverage, for fake tests, for fake OKRs
what are people actually getting done? I've sat next to our top evangelist for 30 minutes pair programming and he just fought the tool saying something was wrong with the db while showing off some UI I dont care about.
like that seems to be the real issue to me. i never bother wasting time with UI and just write a tool to get something done. but people seem impressed that AI did some shitty data binding to a data model that cant do anything, but its pretty.
it feels weird being an avowed singularitarian but adamant that these tools suck now.
> Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
That's exactly the point. Modern coding agents aren't smart software engineers per se; they're very very good goal-seekers whose unit of work is code. They need automatable feedback loops.
It helps to write out the prompt in a seperate text editor so you can edit it and try to desribe what the input is, and what output you want as well as try to describe and catch likely or iteratively observed issues.
You try a gamut of sample inputs and observe where its going awry? Describe the error to it and see what it does
I have found AI great in alot of scenarios but If I have a specific workflow, then the answer is specific and the ai will get it wrong 100% of the time. You have a great point here.
A trivial example is your happy path git workflow. I want:
- pull main
- make new branch in user/feature format
- Commit, always sign with my ssh key
- push
- open pr
but it always will
- not sign commits
- not pull main
- not know to rebase if changes are in flight
- make a million unnecessary commits
- not squash when making a million unnecessary commits
- have no guardrails when pushing to main (oops!)
- add too many comments
- commit message too long
- spam the pr comment with hallucinated test plans
- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )
- not make DCO compliant commits
...
Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?
All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.
Because it's not? I use these things very extensively to great effect, and the idea that you'd think of it as "smart" is alien to me, and seems like it would hurt your ability to get much out of them.
Like, they're superhuman at breadth and speed and some other properties, but they don't make good decisions.
Just a supplementary fact: I'm in the beneficial position, against the AI, that in a case where it's hard to provide that automatic feedback loop, I can run and test the code at my discretion, whereas the AI model can't.
Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.
Yesterday I generated a for-home-use-only PHP app over the weekend with a popular cli LLM product. The app met all my requirements, but the generated code was mixed. It correctly used a prepared query to avoid SQL injection. But then, instead of an obvious:
"SELECT * FROM table WHERE id=1;"
it gave me:
$result = $db->query("SELECT * FROM table;");
for ($row in $result)
if ($["id"] == 1)
return $row;
With additional prompting I arrived at code I was comfortable deploying, but this kind of flaw cuts into the total time-savings.
Yeah, you're right, and the snark might be warranted. I should consider it the same as my stupid (but cute) robot vacuum cleaner that goes at random directions but gets the job done.
The thing that differentiates LLM's from my stupid but cute vacuum cleaner, is that the (at least OpenAI's) AI model is cocksure and wrong, which is infinitely more infuriating than being a bit clueless and wrong.
I've been trying to solve this by wrapping the generation in a LangGraph loop. The hope was that an agent could catch the errors, but it seems to just compound the problem. You end up paying for ten API calls where the model confidently doubles down on the mistake, which gets expensive very quickly for no real gain.
You can play with the model for free in chat... but if $20 for a coding agent isn't effectively free for use case it might not be the right tool for you.
ETA: I've probably gotten 10k worth of junior dev time out of it this month.
You might get better code out of it if you give the AI some more restrictive handcuffs. Spin up a tester instance and have it tell the developer instance to try again until it's happy with the quality.
Skill comes from experience. It takes a good amount of working with these models to learn how to use them effectively, when to use them, and what to use them for. Otherwise, you end up hitting their limitations over and over and they just seem useless.
They're certainly not perfect, but many of the issues that people post about as though they're show-stoppers are easily resolved with the right tools and prompting.
Right. But "prompt" also covers a lot of ground, e.g. planning, tracking tasks, etc. The codex-style frameworks do a good amount of that for you, but it can still make a big difference to structure what you're asking the model to do and let it execute step by step.
A lot of the failures people talk about seem to involve expecting the models to one-shot fairly complex requirements.
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.