I think for people with fairly high experience, the current AI boom is definitely a win; improved productivity for simple tasks and responsive rubber duck to run your ideas by. However, for new grads, students, and even mid-level devs, whenever I see them with Copilot (and just today, Cursor) out, I cry a bit inside and lower my expectations.
I’m not sure why but in my experience, juniors/students that use AI will generally struggle more to debug problems & give up earlier
Edit: and if I sound arrogant, let it be known that I definitely dont have the experience/intelligence to use AI productively, and thus I’ve avoided it beyond experimenting with capabilities
As someone responsible for the fall out from these tools being used to cut corners, I doubt it. And I'm talking about senior developers here with 10y+ experience.
People don't bother to understand problems now. Only the happy path is considered. It takes real intelligence to understand the side effects even a simple thing can have. What was a whiteboard session and a thinking process is now blindly trusted to some 3rd party agent.
Consider reliably receiving a message from a queue and processing it at least once. So far I have encountered at least 0 times processing on two separate occasions. That is a big problem. Turns out the code was generated and no one even knows or cares enough including the people reviewing it because their magic friend wrote it and their other magic friend reviewed it and the management is happy that they were sold magic because some other magic thing wrote some marketing spiel that absolved them from any decision making responsibility.
In some contexts a mistake like that has an actual tangible capital cost if it's a trade or a transaction in a regulated industry. It's not "just a mistake you can learn from" at that point.
People tend to lazy, to steal a mathematical expression, and these tools bring the worst out in people.
That's partly why I've only been using these tools for inconsequential tasks who's primary time cost is typing. Like throwing together a primitive prototype or mock generator and using my brain to build on it or get the rough structure down. HTML/CSS templating for example is still easy to screw up if you actually care about quality, but if you don't then you may as well find the fastest path there.
I agree that the more convenience people rely on though, they seem less willing to sit with a hard problem and learn it the hard way, which is required in so many areas of programming or w/e, or sit with it and just grind their brain for possible edge cases or robust ways to implement something.
I've found that AI takes much more edge cases into account than my lazy brain does. It has no problem writing tedious code covering things that are very unlikely to happen. When I see the solution it suggests it's often more robust than my initial plan.
How is it different from common situation today when a piece of code was written a long time ago by someone who no longer works at the company, so nobody understands how it works?
The old code is probably close enough to correct. It has been battle tested for a while if it has not been actively worked on. If it has operated for good while without immediately noticeable issues it is likely okay for current uses.
It is at best on par with Stack Overflow and only works when the LLM has a lot of prior examples it has trained on. Professional coding is not something that can tolerate introducing bugs that take for ever to debug simply because the LLM lacks enough training data.
Also it lacks the concept of libraries having different API versions. And so you will see in the generated code it will happily mix methods from both versions. And because the model takes so long to train you can't use newer libraries.
This has been my experience with the chat models (and yes, I'm talking about SOTA, both Claude and GPT), but I've found Copilot autocomplete to be an improvement that is worth the cost.
That's not a huge endorsement given that it's not especially expensive, but it definitely decreases the burden of creating a large amount of new code quickly, which makes bootstrapping a new personal project much easier than before. The key is to give it discrete chunks to generate—small helpers, single lines, or test cases—and not expect it to come up with a reasonable architecture or to even generate a single class.
But I wholeheartedly agree about the chat models. Every time I've tried them I've wished I hadn't. The experience of interacting with them is similar to interacting with a junior developer, and it's not worth pairing with a junior if it's not an investment in their training.
They will. And then they’ll learn the fundamentals. I thought I knew most things I needed when I was 20 because who needs fundamentals when you can just use frameworks? Ah well…
I don’t think this is any different from “young devs don’t know shit, they only know Unity but have no clue of how things work in reality.” Developers need to start somewhere and will work their way down the stack. As the height of the stack increases, that journey takes longer. But the height increases because every additional layer increases productivity.
As long as the task remains non-trivial and there is still something to be done – mostly forming a clear idea of what you want to do and what you need and laying it out, now in english – why would this be any less wrong than any of the other times when the incumbents got worried and uphill-both-ways-y?
When has this purity/concept idea ever stood the test of time? Is it only music when it's on tape? Is hiphop even music? Can't you do art with photoshop?
Things evolve and the things that once seemed important won't be. People will be lazy and clever and adapt.
"Sometimes you get a 100-line diff to your code that nails it, which could have taken 10+ minutes before" so it only took him 10 mins to write correct 100 line block without LLM? I guess some people are really 10x.
There is some hilarity in having Karpathy who's most likely had significant contributions to OpenAI's current offerings recommend that people ought to use Sonnet instead.
I am whiny about being anti-AI, but no doubt they are well-suited for English<->programming translation tasks, which along with a well-tuned bag of tricks is a useful tool.
But two issues the tech community had not spent nearly enough effort discussing:
1) As a big fan of Idris, I am worried that these tools will strongly disincentivize language development: why design an elegant language if an LLM can write the boilerplate faster than you can write a cleaner implementation?
2) I still don't think these tools are even slightly ethical. In 2022 I kicked the tires on ChatGPT-3.5 for F# codegen, and got some truly terrible results. I copy-pasted some lines into GitHub and found the unique repositories which ChatGPT was obviously plagiarizing from, and with 15 seconds of prompt "engineering" I got it to spit out ~200 lines verbatim from my personal F# linear algebra library - the only thing that was changed was stripping out the comments and updating some syntax to F# 4.7. Pure plagiarism. It is especially frustrating that GPT is more likely to plagiarize that library precisely because there aren't very many similar repos on GitHub.
Obviously the plagiarism problem can be fixed. (and it seemingly has been...for F#. Not sure about Idris!) However, it really seems like that sort of RLHF fine-tuning is about covering OpenAI's tracks, not "teaching" the AI how to "generalize." In particular I refuse to use the tool because now instead of reliably getting it to plagiarize from F# developers, I have no clue whatsoever if it's stealing or if it managed to truly autoregress its way into an ethical solution. So instead of rolling the dice on being a graceless scumbag, I'll just take my time writing out my code by hand.
And it was striking that GPT-3.5 had read and memorized more F# than Don Syme has seen in his entire life, yet in response to simple questions it was a mindless plagiarist. It's a stark illustration why the legal argument that ANN learning = human learning is vacuous, and why OpenAI should lose most of the copyright lawsuits it's facing.
What he describes is why I have stayed away from using these tools so far. I don't want to be exposed to something that is just useful enough to feel indispensable, but still comes with so many drawbacks.
Cant wait for Opus 3.5 or GPT5 and of course Copilot
Honestly writing code is so much more fun (especially Go code, if err if err if err..)
I just tried Cursor because of this tweet, and it is really nice, I did pay for it.
I just turn all assistance off when I have to think, because it violently interrupts my thoughts with random suggestions, but turns out most of the time, I just have to spit out code.
Does anyone have a write up of the response format the llm uses in these type of editors? I'm assuming the llm can't generate exact diff format without making errors.
Sorry to go meta, but this post has been demoted to page 6 since I first read it about an hour ago, despite being 80 points, 3 hours ago. I think it's because people have flagged it. I'm frustrated at how powerful flagging is compared to up-votes for stories. Tiny rant over.
I pity comments like this. Instead of trying to up skill to use the newest programming tools, you are setting yourself for failure.
Sonnet 3.5 digests close to 400k bytes of text and produces coherent code that works on the first try. If someone says its not working and they are a professional programmer, get ready to feel like you are hit by ton of bricks next year. The productivity boost is only going to accelerate and those who can't adopt will be left behind.
a) There is no up-skilling needed to use LLMs. They are very basic to use.
b) Many of us have used them for a while now and can speak from experience that they aren't providing a meaningful productivity boost. Simply because they don't work well enough to provide a positive ROI. And no amount of prompting expertise can change that.
c) For me it is junior developers who love these tools because they think it's a shortcut to becoming experienced. But it's akin to cheating. You're not actually learning why and how things are supposed to work. And that will hurt you in professional environments where you often need to explain why you wrote that code and introduced that bug.
Your (1) is not matching with (2) because there are anecdotes contrary to yours (the tweet in question and my personal one). I have close to 2 decades of experience in a variety of languages and frameworks and never felt this powerful and liberated with any of the previous tools.In the past year I have developed 2 complex products nearing market launch with just me on a part time basis.
My professional colleagues continue to feel the exact same way you feel and despite my best efforts refuse to even bother using them for anything. Using LLMs might appear to be simple and the prompt length might be similar between an experienced user vs naive one but the way intent is conveyed varies with skill level.
My only complaints about LLMs are:
1) Context is still a limiting factor (so only medium sized projects)
2) I have to still copy paste the code (no IDE truly helps here)
What has improved in the past 6 months:
Sonnet happened and I no longer have to worry about the code being wrong or that it contains obvious mistakes. In many cases where I thought it got it wrong turned out to be a clever way to minimize the number of changes needed/clever ways to do more with less. We are approaching the point where humans no longer are intelligent enough to appreciate the LLMs.
I look forward to the day that I can be "intelligent enough" to truly appreciate LLMs. Maybe I need to buy a course from someone on X.
And not from months of experience using Claude where it over and over again will give me algorithms that are wrong, assure me every time it is right and do so using versions of libraries that are typically a year or more old.
"There is no up-skilling needed to use LLMs. They are very basic to use."
Hard disagree on that. Using LLMs effectively is deceptively deep. Sure, anyone can throw a prompt at a chatbot - but I've been using them on an almost daily basis for over two years at this point and I still feel like I'm finding out new ways to improve my prompting several times a week.
We are many professionals that share Karpathy's opinion on this and know for a fact that it provides a very meaningful productivity boost. It may not be for everyone but I can absolutely not imagine going back, and can confidently say it's not just junior developers that love these tools.
Why isn't there a single screencast (un-edited, un-cherry-picked) of anyone showing off their 10x productivity boost in a full "typical" coding session?
Having rewatched that myself the other day it's not actually as good an example as I thought - I use Claude 3.5 Sonnet a bit in it (which was released the morning we recorded that video) and then get a bit of benefit out of Val Town's integration with Codeium, which is similar to VS Code Copilot - but not as much of the code in it was LLM-generated as I remembered.
I would point out that the OCR example (and from what I see the series of posts you linked to) aren't "live" coding screen shares and don't convey the nitty gritty of how these things are used and how well they work.
I'd love to see this operationalised as concrete predictions, as one might find on a prediction market! Do you have any specific predictions about programming next year?
I ask (for example) because I suspect shitting out CRUD apps is cheaper via LLM than via human now, and I guess probably most programming work is of that nature, but there are programmers out there whose job is not shitting out CRUD apps, and it's not clear from your statement whether you intend the sentiment to cover those programmers too.
The answer lies in your question. I foresee consolidation in programming languages and frameworks with compact and well known ones edging out esoteric and niche ones. In a couple of years of time, I predict that there will be new languages specifically targeting LLMs that aren't as human readable but extremely compact similar to byte code (compactness is preferred due to context size limitation not fully going away).
So in a nutshell I feel like most things will be LLM generated with human focus mostly around systems boundary stitching with focus on extreme cases like quant and medical domains where human oversight might be needed.
Let's cross that bridge when we come to it, shall we. Meanwhile, you should be glad we are refusing to use it. If it works as well as you claim, this situation is to your advantage.
Curious to know why you think it's vaporware. Are the latest LLMs like 3.5 Sonnet bad at original programming based on your experience? It hasn't been the case for me when using it for real world projects lately.
Most of the code my friends and I write, isn't original. And it's not just people who make $50K/year. Obviously LLM-assisted code writing is still in its infancy, but it has made a lot of mundane things a breeze already. It sucks that one has to know its shortcomings to make it actually useful for yourself (e.g. I won't ask it to write a context-aware function right away, but I know it's great at generating stubs). But we'll get there, I think.
"Andrej Karpathy (born 23 October 1986[2]) is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. He co-founded and formerly worked at OpenAI"
I think he should get some cred with such track record.
You didn't learn how to use these tools properly. If you did you wouldn't have that opinion. Karpathy doesn't just write code snippets for educational purposes. Most of the code he writes is for real world systems and it's not publicly available.
I'm trying to understand how you can so easily dismiss all the professionals thinking it's useful. A more charitable explanation could be that it's useless for you but not others.
I’m not sure why but in my experience, juniors/students that use AI will generally struggle more to debug problems & give up earlier
Edit: and if I sound arrogant, let it be known that I definitely dont have the experience/intelligence to use AI productively, and thus I’ve avoided it beyond experimenting with capabilities