There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.
For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.
Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.
High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens.
But isn't the one dimensional tokens a reflex of high dimensional space? What you see is "sure let's take a look at that" but behind the curtains it's actually an indication that it's searching a very specific latent space which might be radically different if those tokens didn't exist. Or not. In any case, you can't just make that claim and isolate those two processes. They might be totally unrelated but they also might be tightly interconnected.
I assume in practice, filler words do nothing of value. When words add or mean nothing (their weights are basically 0 in relation to the subject), I don't see why they'd affect what the model outputs (except cause more filler words)?
Politeness have impact (https://arxiv.org/abs/2402.14531) so I wouldn't be too fast to make any kind of claim with a technology we don't know exactly how it works.
The existence of science does not obligate us to either receive a double-blind study of massive statistical significance on the exact question we're thinking about or to throw our hands up in total ignorance and sit in a corner crying about the lack of a scientific study.
It is perfectly rational to rely on experience for what screens do to children when that's all we have. You operate on that standard all the time. I know that, because you have no choice. There are plenty of choices you must make without a "data" to back you up on.
Moreover, there is plenty of data on this topic and if there is any study out there that even remotely supports the idea that it's all just hunky-dory for kids to be exposed to arbitrary amounts of "screen time" and parents are just silly for being worried about what it may be doing to their children, I sure haven't seen it go by. (I don't love the vagueness of the term "screen time" but for this discussion it'll do... anyone who wants to complain about it in a reply be my guest but be aware I don't really like it either.)
"Politicians" didn't even begin to enter into my decisions and I doubt it did for very many people either. This is one of the cases where the politicians are just jumping in front of an existing parade and claiming to be the leaders. But they aren't, and the parade isn't following them.
I've been waiting for the article talking about how AI is affecting COBOL. Preferably with quotes from actual COBOL programmers since I can already theorize as well as the next guy but I'm interested in the reports from the field.
While LLMs have become pretty good at generating code, I think some of their other capabilities are still undersold and poorly understood, and one of them is that they are very good at porting. AI may offer the way out for porting COBOL finally.
You definitely can't just blindly point it at one code base and tell it to convert to another. The LLMs do "blur" the code, I find, just sort of deciding that maybe this little clause wasn't important and dropping it. (Though in some cases I've encountered this, I sometimes understand where it is coming from, when the old code was twisty and full of indirection I often as a human have a hard time being sure what is and is not used just by reading the code too...) But the process is still way, way faster than the old days of typing the new code in one line at a time by staring at the old code. It's definitely way cheaper to port a code base into a new language in 2026 than it was in 2020. In 2020 it was so expensive it was almost always not even an option. I think a lot of people have not caught up with the cost reductions in such porting actions now, and are not correctly calculating that into their costs.
It is easier than ever to get out of a language that has some fundamental issue that is hard to overcome (performance, general lack of capability like COBOL) and into something more modern that doesn't have that flaw.
Nominally, Common Law, the system of law that to a first approximation is used in countries descended from the UK, has a lot of protections of that sort. You can't put "unconscionable" terms in a contract, e.g., it is simply illegal to sell yourself into total slavery in common-law derived systems. All signatories to a contract must consent, must not be under duress, the contract can not be one-sided (this doesn't mean "the contract is 'fair' from a 3rd-party point of view" but "the contract can't result in only one side giving things but the other doesn't"), and a variety of other common sense rules.
In practice, availing yourself of any of these protections is a massively uphill battle. Judges tend to presume that these common law matters are already embedded into the de facto legal system because the people writing the laws already operated under those assumptions while framing the law. Personally, I disagree and think a lot of these protections have eroded away into either nothing, or so little that it might as well be nothing, but you have a 0% chance of drawing me as a judge in your case so that won't help you much if you try.
I think this is a fundamental LLM issue. I recall a paper a ways back about trying to get the LLMs to be too succinct, and the problem is, with the way they are implemented, the only way they can "think" is to emit a token. IIRC it demonstrated that even when the model is just babbling something like "Yeah, let's take a look at the issue you just raised" that under the hood, even though that output was superficially useless, it was also changing its state in ways related to solving the problem and not just outputting that superficially useless text.
It helps to understand that, because then you can also not be annoyed by things like "Let's do X. No, wait, X has this problem, let's do Y instead." You might think to yourself, if X was a bad idea, couldn't it have considered X and rejected it without outputting a token?" and the answer is, that sentence was it considering X and rejecting it, and no, there is no way for it to do that and not emit tokens. Thinking is inextricably tied to output for LLMs.
There is even some fairly substantial evidence from a couple of different angles that the thinking output is only somewhat loosely correlated to what the model is "actually" doing.
Token efficiency is an interesting question to ponder and it is something to worry about that the providers have incentives to be flabby with their tokens when you're paying per token, but the question is certainly not as easy as just trying to get the models to be "more succinct" in general.
I often discuss a "next gen" AI architecture after LLMs and I anticipate one of the differences it will have is the ability to think without also having to output anything. LLMs are really nifty but they store too much of their "state" in their own output. As a human being, while I find like many other people that if I'm doing deep thinking on a topic it helps to write stuff down, it certainly isn't necessary for me to continuously output things in order to think about things, and if anything I'm on the "absent minded"/"scatterbrained" side... if I'm storing a lot of my state in my output for the past couple of hours then it sure isn't terribly accessible to my conscious mind when I do things like open the pantry door only to totally forget the reason I had for opening it between having that reason and walking to the pantry.
The people spamming curl did step one, "write me a vulnerability report on X" but skipped step two, "verify for me that it's actually exploitable". Tack on a step three where a reasonably educated user in the field of security research does a sanity check on the vulnerability implementation as well and you'll have a pipeline that doesn't generate a ton of false positives. The question then will rather be how cost-effective it is for the tokens and the still-non-zero human time involved.
If you want to understand a fairly non-trivial amount of the brokenness of the world, pondering the implications of "Hey, what if we thought about what our incentives will actually do instead of what we want them to do, and made plans based on that?" being a brilliant and bold breakthrough in the world of governance rather than common sense can take you a long way.
That's the fun thing about common sense, everybody has a different definition of it.
The only way to know what your incentives will do is let them play out. Now, you can make educated guesses on what will happen, but much like computer security, people find surprising ways to break things.
You sound like you're citing the general Internet understanding of "fair use", which seems to amount to "I can do whatever I like to any copyrighted content as long as maybe I mutilate it enough and shout 'FAIR USE!' loudly enough."
On the real measures of "fair use", at least in the US: https://fairuse.stanford.edu/overview/fair-use/four-factors/ I would contend that it absolutely face plants on all four measures. The purpose is absolutely in the form of a "replacement" for the original, the nature is something that has been abundantly proved many times over in court as being something copyrightable as a creative expression (with limited exceptions for particular bits of code that are informational), the "amount and substantiality" of the portions used is "all of it", and the effect of use is devastating to the market value of the original.
You may disagree. A long comment thread may ensue. However, all I really need for my point here is simply that it is far, far from obvious that waving the term "FAIR USE!" around is a sufficient defense. It would be a lengthy court case, not a slam-dunk "well duh it's obvious this is fair use". The real "fair use" and not the internet's "FAIR USE!" bear little resemblance to each other.
A sibling comment mentions Bartz v. Anthropic. Looking more at the details of the case I don't think it's obvious how to apply it, other than as a proof that just because an AI company acquired some material in "some manner" doesn't mean they can just do whatever with it. The case ruled they still had to buy a copy. I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2". It's a somewhat lengthy road to travel, where each step could result in a failure, but the same can be said for the road to "just because I can lay my hands on it means I can feed it to my AI and 100% own the result" and that has already had a step fail.
"Real" fair use is perhaps one of the most nebulous legal concepts possible. I haven't dived deep into software, but a cursory look at how it "works (I use that term as loosely as possible)" in music with sampling and interpolation etc immediately reveals that there's just about nothing one can rely on in any logical sense.
I'm not really sure why you think my comment specifically citing the recent rulings by Judge Alsup and also the prior history with respect to the Google Books project is somehow declaring "I can do whatever I like to any copyrighted content", but I assure you I'm not. I'm very specifically talking about the various cases that have come about in the digital age dealing with fair use as it has been interpreted by US courts to apply to the use of computers to create copies of works for the purposes of creating other works.
I'm referring to the long history of carefully threaded fair use rulings and settlements, many of which we as an industry have benefitted greatly from. From determinations that cloning a BIOS can be fair use (see IBM PC bios cloning, but also Sony v. Connectix), or that cloning an entire API for the purposes of creating a parallel competitive product (Google v. Oracle), or digitizing books for the purposes of making those books searchable and even displaying portions of those books to users (Authors Guild v. Google) or even your cable company offering you "remote DVR" copying of broadcast TV (20th Century Fox v. Cablevision). Time and again the courts have found that copyright, and especially copyright with respect to digital transformations is far more limited than large corporations would prefer. Further they have found in plenty of cases that even a direct 1:1 copy of source can be fair use, let alone copies which are "transformative" as LLM training was found to be in Bartz.
Realistically, I don't see how anyone can have watched the various copyright cases that have been decided in the digital age, and seen the battles that the EFF (and a good part of the tech industry) have waged to reduce the strength of copyright and not also see how AI training can very easily fit within that same framework.
Not to cast aspersions on my fellow geeks and nerds, but it has been very interesting to me to watch the "hacker" world move from "information wants to be free" to "copyright maximalists" once it was their works that were being copied in ways they didn't like. For an industry that has brought about (and heavily promoted and supported) things like DeCSS, BitTorrent, Handbrake, Jellyfin/Plex, numerous emulators, WINE, BIOS and hardware cloning, ad blockers, web scrapers and many other things that copyright owners have been very unhappy about, it's very strange to see this newfound respect for the sanctity of copyright.
> I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2".
And I would argue that obtaining a legal copy of the GPL source to a program requires no such agreement. By downloading a copy of a GPLed program I am entitled by the terms under which that software was distributed to obtain a copy of the source code. I do not have to agree to any other terms in order to obtain that source code, downloading from someone authorized to distribute that code is in and of itself sufficient to entitle me to that source code. You can not, by the very terms of the GPL itself deny me a copy of the source code for GPL software you have distributed to me, even if you believe I intend to make distributions that are not GPL compliant. You can decline to distribute the software to me in the first place, but once you have distributed it to me, I am legally entitled to a copy of the source code. From there, now that I have a legal copy, the question becomes is making additional copies for the purposes of training an AI model fair use? So far, the most definitive case we have on the matter (Bartz) says yes it is.
So either we have to make the case that the original copy was somehow acquired from a source not authorized to make that copy, or we have to argue that the output of the AI model or the AI model is itself infringing. Given the ruling that copies made for training an AI model was ruled "exceedingly transformative and was a fair use under Section 107 of the Copyright Act"[1] it seems unlikely that the AI model itself is going to be found to be infringing. That leaves the output of the model itself, which Bartz does not rule on, as the authors never alleged the output of the model was infringing. GPL software authors might be able to prevail on that point, but they would have a pretty uphill battle I think in demonstrating that the model generated infringing output and not simply functional necessary code that isn't covered by copyright. The ability of code to be subject to copyright has long been a sort of careful balance between protecting a larger creative idea, and also not simply walling off whole avenues of purely functional decisions from all competitors.
I am polite when using AI, not because I mistake it for a human, but because I'm deliberately keeping it in the "professional colleague" persona. Tell it to push back, and then thank it for something it finds in your error. I may put a small self-deprecating joke in from time to time. It keeps the "mood" correct.
Another way you can think of it is that when you're talking to an AI, you're not talking to a human, you're talking to distillation of humanity, as a whole, in a box. You want to be selective in what portion of humanity you are leading to be dominant in a conversation for some purpose. There's a lot in there. There's a lot of conversations where someone makes a good critical point and a flamewar is the response. A lot of conversations where things get hostile. I'm sure the subsequent RHLF helps with that, but it doesn't hurt anything to try to help it along.
I see people post their screenshots of an AI pushing back and asking the user to do it or some other AI to do it, and while I'm as amused as the next person, I wonder what is in their context window when that happens.
> you're talking to distillation of humanity, as a whole, in a box.
This is an aside, but my impression is that it is a very selective and skewed distillation, heavily colored by English-language internet discourse and other lopsided properties of its training material, and by whoever RLHF’d it. Relatively far away from being representative of the whole of humanity.
Yes, absolutely. I'm not trying to claim it's some sort of unbiased sample, but more get across the idea that modelling AI as a person, a singular person, in your head is inaccurate. That singular person would have a stereotypical, Hollywood-esque multiple personality disorder like no actual human on Earth has ever had. You need to be thinking about not just what the person-like thing in front of you is doing, but how to craft which person you're ending up with.
Agreed, putting effort into my side of the role-play almost always improves the model's responses. The attention required to do that also makes it more likely that I'll notice when the conversation first starts going off the rails: when it hits the phase transition (https://arxiv.org/abs/2508.01097). It does still seem important to start new chats regularly, regardless of growing context sizes.
Similar approach works for me. But then I also have a separate checks at the end of the session basically questioning the premise and logic used for most things except brainstorming, where I allow more leeway. You can ask to be challenged and challenged effectively, but now I wonder if people do that.
Are you paying for the dial up service? If not, gosh, you seem to be out of luck.
(Fresh out of college while the dot-com crash was still in effect, I briefly took a job for a local phone company. Their primary income was from people who were still paying 1996-ish prices for T1 lines, of hundreds and hundreds a month. Meanwhile I would go home to my cable modem which was about 4 times faster for ~$50/month. Now, techically, the T1s were dedicated bandwidth and of course my cable modem was shared, but it was still a terrible deal for them. And they weren't even getting subsidized computers out of it!)
For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.
Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.
reply