Hacker Newsnew | past | comments | ask | show | jobs | submit | tensor's commentslogin

I still find the idea that "learning" from code is "stealing" kind of ridiculous.

The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.


It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.

It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)


> It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2.

Is there any line past which it becomes unreasonable?

> It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

If the machines were a replacement for a damaged respiratory system in a human would it reasonable?

What about if the machine were being used by a human to do something else that was important?

Where is the line where it becomes reasonable?


> Is there any line past which it becomes unreasonable?

That's exactly the question we should be asking about AI and fair use.


Are you refusing to engage with your own metaphor?

It was only to illustrate that human rights don't automatically apply to machines. Let's not read too much into it.

You made a claim and used a metaphor to demonstrate that claim. I asked a very simple question about the bounds of the metaphor and thus the claim. You are dodging answering the questions which mean that you cannot defend the logic of your claim. Thus you have forfeited that your claim is valid and 'human rights don't automatically apply to machines' has not been illustrated.


If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.


Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?



Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

The mental calisthenics required to justify this stuff must be exhausting.


> The mental calisthenics required to justify this stuff must be exhausting.

It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.


Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.


Humans have been known to recite entire parts from plays from memory, live in front of audiences even.

And they are legally required to license the play to do that, if it's still in copyright.

Only to perform it, not learn it.

And LLMs perform when you prompt them.

This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.

I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".


>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.


I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.


The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.

The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.


I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.

Learning, probably not.

Copy/pasting at scale, yes


It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.


The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.

Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.


Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.

And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.


> Does it mean that shreds retain original copyright even if the content can't be restored?

Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.

The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.

> And the specifics of autoregressive pretraining is that it is lossy compression.

That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.

> Good luck finding which copyrighted materials have made it into the final weights.

That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.

Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.


> Yup, it absolutely does

Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

> in fact, pull out full NYT articles

That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.


> Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.

Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.

Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.


I've meant idealized shredding: a destructive transformation, which is still a machine transformation (think blender instead of shredder). When you need the exact knowledge of a thing to make its (imperfect) copy using some mechanism, it doesn't mean that the mechanism violates copyright.

EDIT: I don't say that neural networks can't rote learn extensive passages (it's an effect of data duplication). I'm saying that they are not designed to do that and it's possible to prevent that (as demonstrated by the latest models).


I'd assume it's still a copyright violation if you copied and distributed the shredded copy.

The way I arrive at that is imagine you add just 1 pixel of static to a video, that'd still be a copyright violation. Now imagine you slowly keep adding those random pixels. Eventually you get to the point where the whole video is just static, but at some point it wasn't.

Now, would any media company or court sue over that? Probably not. But I believe that still falls under copy right (but maybe fair use?).

The issue with neural networks is they aren't people. Even when you point your LLM at a website and say "summarize this" the output of that summation would be owned by the website itself by nature of it being a machine transformed work.

Remembered, it's not just mere rote recitation which violates the law, any transformation counts as well. The fact that AI companies are preventing it doesn't really solve the problem that they are in fact transforming multiple copyrighted works into their responses.


When you point your browser at a website the browser creates a (transformed) local copy of the information that is owned by the website itself. The browser needs to do that to render the website on your screen. Is it a violation of copyright (that the website is willing to tolerate because it profits from advertisements)?

No, because your browser is dealing with the distribution of data in a way intended by the copyright holder. You also aren't redistributing the webpage after rendering. Client side modifications fall under fair use which is what keeps the likes of ad blockers and other page modifiers legal.

What would violate copyright is if you took that rendered page, turned it into a jpeg, and then hosted that jpeg from your own servers. That's the copying that would run afowl of copyright law.


A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.

Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?

If there were the case, then imagine having to give it back!

If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.


Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?

If I “learned” your essay and handed it in, would you be happy with that?

The US is sure becoming an unfree scary place just like Russia. Keep it up following those role models!

>It's literally the largest registrar in the world, by a large margin. When you're a business and want something reliable, picking the most popular provider is usually a strategy that works decently well. They're more likely to have established processes that work for all sorts of cases.

It's also literally one of the most criticized and awful registrars in the world, by a large margin. If decades of stories like this don't convince you to go with a more reliable registrar then I have very little sympathy.

This story is not egregious, it's in fact typical of GoDaddy. Every so often we get a HN post with a GoDaddy horror story. You'd think people would have learned by now.


Oh so only 1/2 to 3/4 of them were terminated far outside of norms. I guess only 50%-75% corrupt anti-science activity is totally ok.

> Indeed, even if one isn't partial to China, there's reasons to be glad that an increasingly hostile US has powerful competition.

This is how I see it. The US has openly threatened multiple times to annex my country, and has repeatedly threatened every western nation. Letting the US have a monopoly on... well.. anything, is really bad for the world. The more countries that have their own production for various critical things like computer chips, medicine, etc, the better it is for the world at it distributes power.

People in the US don't seem to understand that with the current administration the US is seen as a potentially very hostile nation. While I don't think China is a friend to Canada or the west, at least it provides alternatives when the US tries to use it's monopolies against us. And vice versa too.

>Building a frontier model would be lobbing money into the incinerator for something that will be outdated tomorrow. European investors are too careful for that - and in this case seem to be right.

Strong disagree here. Mistral does great work, in the long term being a few months or even a year behind is a non-issue. Also Cohere just merged with Aleph Alpha to continue producing foundational models. It's extremely important that the middle powers continue to do this.


I see young people advocating for socialism a lot in Canada, but rarely communism as in communist Russia and communist China. As others have said, old style communism isn't even around anymore. Russia is a fake democracy and China is a strange blend of one party rule and capitalism.

I don't think it does anyone any good to throw around naive and simple terms like communism. Focus on issues like public healthcare, breaking monopolies, basic incomes, and so on. We'll get along a lot better that way.


canada has our own history of socialism in the form of crown corps and healthcare. why wouldnt we lean into our own successful practices?

Because they'll make you worse off the more you scale them up. It's like pointing out that a drink of alcohol with a friend led to positive results so why not lean heavily into drinking? And the answer is because it is something that people enjoy that can be tolerated in small amounts but isn't much of a strategy if the goal is a happy, healthy outcome.

That's ridiculous. The countries with the highest quality of living all have strong social programs. If you want an analogy for alcoholism look at the US. Capitalism works here, so let's use it everywhere!

I'm tempted to copy what you wrote as a response without the "That's ridiculous" part. It isn't ridiculous, it is just a factual description of reality. The reason the US can afford the strong social programs is because of its heavy commitment to capitalism. If a country is poor and weak then it can't afford to endure the pain that a strong social program causes. Poor countries just can't sustain populations of people who consume resources and don't create anything especially valuable. If you scale up the social programs too far at some point the wealth destruction becomes intolerable; there's some optimal amount of damage that can be accepted and "lean in to socialism" isn't the best strategy to find that balance because by the time the pain becomes intolerable it has already happened.

> The reason the US can afford the strong social programs is because of its heavy commitment to capitalism.

The US does not have strong social programs. It's an example of the opposite. Look at Canada and Europe for examples of strong social programs.


"communist Russia"

Actually it's extremely well documented in science studies that money absolutely makes you happy up to a certain point. Basically if you don't have a home and food due to not enough funds, then yes money absolutely equals happiness.

Inequality has grown to the point where the majority of younger people now have no hope of ever owning a home, and even large parts of the country are struggling with something as basic as food.

The HN crowd lives in a top 5% bubble and often forgets how bad it is for most people. All this talk of "money doesn't make happiness" is terrible. Money for basic necessities is the problem here.


It goes a little further than “money for basic necessities”.

It’s about being able to provide the necessities AND having income security. I remember reading about a study that said poor people who have to scramble to deal with all of the extra steps that accompany being poor (no credit cards, maybe no bank account, dealing with getting utilities turned back on, etc) is the equivalent of losing about 15 IQ points from your optimal.

It’s the difference between being able to work “in the zone” / flow state frequently and being always stuck in “fight or flight” mode. One makes you successful while the other actively sabotages you.


No, happiness increases linearly with log(money), well beyond basic necessities https://happiness-science.org/price-of-happiness/

> Inequality has grown to the point where the majority of younger people now have no hope of ever owning a home

Please stop repeating this myth. Look further up the thread for gen Z homeowner statistics.


> in science studies that money absolutely makes you happy up to a certain point

Perceived happines. It's hard to talk about happines with a person with an empty stomach. But I was much more happy when I was young and poor than I became a not poor but no longer a young one.


> Perceived hapiness

Is there any other kind?


You can buy any bicycle[0] you want when you are rich but if you didn't had a bicycle in your childhood then you didn't had a bicycle in your childhood.

[0] or LEGO, Transformers action figure, whatever


That's not because you were poor then and not now but because you had few responsibilities then and many now. When you were young your needs were small and solely affected you, now your needs encompass those of family (I'm guessing that you have children). Even without family you have responsibilities to society and employer that you either did not have when young or that were simply less urgent.

The UX used to be better by a country mile. The liquid glass update was a genuinely serious regression. Is Windows or Android now better? At least those operating systems don't have constant contrast issues and flickering. At this point they probably have more consistency.

MacOS reliability has slowly gotten worse and worse, but the UX drop with liquid glass was profound.


I don't agree with the whining about liquid glass. Sure, it isn't the design you like. But usability really isn't that different.

No, it's objectively bad in terms of usability. There is also the matter of taste, but I'm not even talking about that. I'm talking about UX, not style. UX is about functionality and usability.

Contrast is an objective measure. There are well studied and known levels where you can have trouble reading, or an easy time reading. Similarly, things like drag regions not even aligning with visual elements are literally indefensible. This stuff is so basic you'd fail a UX 101 course with it.

Things like spotlight defaulting to the newest item so that when you hit enter and it changes your selected item the millisecond before you hit enter. I'm not even sure how you'd try to defend UI elements literally flickering as either style or not affecting usability.

It's objectively bad by a great many widely agreed upon and studied standards.


Contrast was bad in the first couple bets, but now it’s very similar to iOS 18.

You're still reacting to the early beta, I think.

No, I don’t generally use betas. In fact the Liquid Glass release was the first time I DID sign up for betas, but only after the actual release because I wanted to get the fixes faster.

While they’ve improved some of the contrast issues, all the other issues I mentioned are there to this day.


I agree. MacOS became completely unusable with Liquid Glass, it totally feels like one of those amateur custom themes for Linux.

I hope the new leadership will bring back better software. As of now, macOS 26 is disgusting.


Wealth concentration has been happening for a century. You don't need AI for that.


The power grids of US states are similarly linked. Very dirty.


Except for Texas, which decided as a state that avoiding federal regulation was worth people dying every winter from power outages.


I'm not a fan of Texan electrical isolationism, but "people dying every winter from power outages" is stretching it a bit...


Every winter is a stretch, yes.

But they did get a big warning shot in 1989 and 2011, and ignored those lessons for cost reasons. A couple hundred people died.


> But they did get a big warning shot in 1989 and 2011, and ignored those lessons for cost reasons.

Cost is always a valid reason!

> A couple hundred people died.

Looks like about a thousand people in the US die of hypothermia every year, on average. So this happens frequently in states that aren't in its own interconnection, too.


> Looks like about a thousand people in the US die of hypothermia every year, on average.

In their powerless homes?

I don't doubt people get lost in the woods. But that's not some systemic failure.


Which actually works out to rather more than one person per winter, when averaged out.


Like all the Canadians who die every winter in the Halifax explosion of 1917.


Ya, it was just one winter where people actually died, it was recent though.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: