Understanding (not necessarily reading) always was the real work. AI makes people less productive because it's speeding up the thing that wasn't hard (generating code), while generating additional burden on the thing that was hard (understanding the code).
There are many cases in which I already understand the code before it is written. In these cases AI writing the code is pure gain. I do not need to spend 30 minutes learning how to hold the bazel rule. I do not need to spend 30 minutes to write client boilerplate. List goes on. All broad claims about AI's effects on productivity have counterexamples. It is situational. I think most competent engineers quietly using AI understand this.
no, it isn't. unless the generated code is just a few lines long, and all you are doing is effectively autocompletion, you have to go through the generated code with a fine toothed comb to be sure it actually does what you think it should do and there are no typos. if you don't, you are fooling yourself.
kind of, except that when i review a code submission to my project i can eventually learn to trust the submitter, once i realize they write good code. a code review is to develop that trust. AI code should never earn that trust, and any code review should always be treated like it it is from a first time submitter that i have never met before. the risk is that does not happen, and that we believe AI code submissions will develop like those of a real human. they won't. we'll develop a false sense of security, a false sense of trust. instead we should always be on guard.
and as i wrote in my other comment, reviewing the code of a junior developer includes the satisfaction of helping that developer grow through my feedback. AI will never grow. there is no satisfaction in reviewing its code. instead it feels like a sisyphusian task, because the AI will make the same mistakes over and over again, and make mistakes a human would be very unlikely to make. unlike human code with AI code you have to expect the unexpected.
Broadly I agree with you. I think of it in terms of responsibility. Ultimately the commit has my name on it, so I am the responsible party. From that perspective, I do need to "understand" what I am checking in to be reasonably sure it meets my professional standards of quality.
The reason I put scare quotes on "understand" is that we need to acknowledge that there are degrees of understanding, and that different degrees are required in different scenarios. For example, when you call syscall(), how well do you understand what is happening? You understand what's in the manpage; you know that it triggers a switch to kernel space, performs some task, returns some result. Most of us have not read the assembly code, we have a general concept of what is going on but the real understanding pretty much ends at the function call. Yet we check that in because that level of understanding corresponds to the general engineering standard.
In some cases, with AI, you can be reasonably sure the result is correct without deeply understanding it and still meet the bar. The bazel rule example is a good one. I prompt, "take this openapi spec and add build rules to generate bindings from it. Follow existing repo conventions." From my years of engineering experience, I already know what the result should look like, roughly. I skim the generated diff to ensure it matches that expectation; skim the model output to see what it referenced as examples. At that point, what the model produced is probably similar to what I would have produced by spending 30 minutes grepping around, reading build rules, et cetera. For this particular task, the model has saved me that time. I don't need to understand it perfectly. Either the code builds or it doesn't.
For other things, my standard is much higher. For example, models don't save me much time on concurrent code because, in order to meet the quality bar, the level of understanding required is much higher. I do need to sit there, read it, re-read it, chew on the concurrency model, et cetera. Like I said, it's situational.
There are many, many other aspects to quantifying the effects of AI on productivity, code quality is just one aspect. It's very holistic and dependent on you, how you work, what domain you work in, the technologies you work with, the team you work on, so many factors.
The problem is, even if all that is true, it says very little about the distribution of AI-generated pull requests to GitHub projects. So far, from what I’ve seen, those are overwhelmingly not done by competent engineers, but by randos who just submit a massive pile of crap and expect you to hurry up and merge it already. It might be rational to auto-close all PRs on GitHub even if tons of engineers are quietly using AI to deliver value.
> There are many cases in which I already understand the code before it is written. In these cases AI writing the code is pure gain.
That's only true if the LLM understands the code in the same way you do - that is, it shares your expectations about architecture and structure. In my experience, once the architecture or design of an application diverges from the average path extracted from training data, performance seriously degrades.
You wind up with the LLM creating duplicate functions to do things that are already handled in code, or using different libraries than your code already does.
Unless you have made some exceptional advances in the LLM agents (if you have, send me the claude skill?), you cant predict it.
If it was predictable like a transpiler, you wouldn't have to read it. you can think of it as a pure gain but you are just not reading the code its outputting.
Very much disagree. When I type code I don't just press keys, I read, think, organize .. and the interplay between acting, seeing, watching, reevaluating was the fun part. There's a part of you that disappear if you only review the result of a generator. That's why it's less interesting imo
As not all codebases are well-written, I have found useful once to get an LLM to produce code that does X, essentially distilling from a codebase that does XYZ. I found that reviewing the code the LLM producced, after feeding the original codebase in the context, was easier than going through the (not very well-written) codebase myself. Of course this was just the starting point, there was a ton of things the LLM "misunderstood", and then there was a ton of manual work, but it is an (admittedly rarer) example for me where "AI-generated" code is easier to read than code written by (those) humans, and it was actually useful having that at that point.
> Understanding (not necessarily reading) always was the real work.
Great comment. Understanding is mis-"understood" by almost everyone. :)
Understanding a thing equates to building a causal model of the thing. And I still do not see AI as having a causal model of my code even though I use it every day. Seen differently, code is a proof of some statement, and verifying the correctness of a proof is what a code-review is.
There is an analogue to Brandolini's bullshit asymmetry principle here. Understanding code is 10 times harder than reading code.
Which is harder, writing 200 lines of code or reading 200 lines of code someone else wrote.
I pretty firmly find the latter harder, which means for me AI is most useful for finessing a roughly correct PR rather than writing the actual logic from scratch.
It makes a great code reading tool if you use it mindfully. For instance, you can check the integrity of your tests by having it fuzz the implementation and ensure the tests fail and then git checkout to get clean again.
Not at all. Curious are you programming anything with AI and what is your output looking like? Are you excited about anything you're building, staying away, or both and have anything you're looking forward to building in this world either with a team, with ai, and/or with yourself?
> AI makes people less productive because it's speeding up the thing that wasn't hard (generating code), while generating additional burden on the thing that was hard (understanding the code).
Only if the person doesn't want the AI to help in understanding how it works, in which case it doesn't matter whether they use AI or not (except without they couldn't push some slop out the door at all).
If you want that understanding, I find that AI is actually excellent with it, when given proper codebase search tools and an appropriately smart model (Claude Code, Codex, Gemini), easily browsing features that might have dozens of files making them up - which I would absolutely miss some details of in the case of enterprisey Java projects.
I think the next tooling revolution will probably be automatically feeding the model all of the information about how the current file fits within the codebase - not just syntax errors and automatically giving linter messages, but also dependencies, usages, all that.
In my eyes, the "ideal" code would be simple and intuitive enough to understand so that you don't actually need to spend hours to understand how a feature works OR use any sort of AI tool, or codebase visualization as a graph (dependency and usage tracking) or anything like that - it just seems that you can't represent a lot of problems like that easily, given time constraints and how badly Spring Boot et al fucks up any codebase it touches with accidental complexity.
But until then, AI actually helps, a lot. Maybe I just don't have enough working memory (or time) to go through 30 files and sit down and graph it out in a notebook like I used to, but in lieu of that an AI generated summary (alongside docs/code tests/whatever I can get, but seems like humans hate writing docs and ADRs, at least in the culture here) is good enough.
At the same time, AI will also happily do incomplete refactoring or not follow the standards of the rest of the codebase and invent abstractions where it doesn't need any, if you don't have the tooling to prevent it automatically, e.g. prebuild checks (or the ability to catch it yourself in code review). I think the issue largely is limited context sizes (without going broke) - if I could give the AI the FULL 400k SLoC codebase and the models wouldn't actually start breaking down at those context lengths, it'd be pretty great.
Yeah I have always seen PRs from new contributors as having (on average) negative value but being an investment into a hopefully future positive contributor. I don't have that optimism for contributors that start out with AI slop.