It's clear Claude adapted code directly from the OxCaml implementation (the PR a...

joelreymont · 2025-11-27T08:44:15 1764233055

The key is that AI adapted, not stole.

It's actually capable of reasoning and generating derivative code and not just copying stuff wholesale.

See examples at the bottom of my post:

https://joel.id/ai-will-write-your-next-compiler/

menaerus · 2025-11-26T07:05:01 1764140701

Sorry, this is just ridicilous and shows how people fragile really are. This whole topic and whole MR as well.

I am routinely looking into the folly implementation, sometimes into the libstdc++, sometimes into libc++, sometimes into boost or abseil etc. to find inspiration for problems that I tackle in other codebases. By the same standards, this should also be plagiarism, no? I manufacture new ideas by compiling existing knowledge from elsewhere. Literally every engineer in the world does the same. Why is AI any different?

meheleventyone · 2025-11-26T13:27:40 1764163660

Perhaps because the AI assigned copyright in the files to the author of the library it copied from and the person prompting it told it to look at that library. Without even getting into the comedy AI generated apologia to go with it which makes it look worse rather than better.

From a pragmatic viewpoint as an engineer you assign the IP you create over to the company you work for so plagarism has real world potential to lose you your job at best. There's a difference between taking inspiration from something unrelated "oh this is a neat algorithmic approach to solving this class of problems" to "I need to implement this specific feature and it exists in this library so I'll lift it nearly verbatim".

menaerus · 2025-11-26T13:50:06 1764165006

Can you give an example what exactly was copied? I ask because I took a look into MR and original repo, and the conclusion is that the tool only copy-pasted the copyright header but not the code. So I am still wondering - what's wrong with that (it's a silly mistake even a human can make), and where is the copyright infringement everyone is talking about?

DrammBA · 2025-11-27T04:57:29 1764219449

> copy-past[ing] the copyright header but not the code [is] a silly mistake even a human can make

Do you mind showing me some examples of that? That seems so implausible to me

Just for reference, here's another example of AI adding phantom contributors and the human just ignoring it or not even noticing: https://github.com/auth0/nextjs-auth0/issues/2432

nsagent · 2025-11-27T07:57:22 1764230242

Oh wow. That's just egregious. Considering the widespread use of Auth0, I'm surprised this isn't a bigger story.

menaerus · 2025-11-27T07:14:28 1764227668

> Do you mind showing me some examples of that? That seems so implausible to me

What's so special about it that I need to show you the example?

kqr · 2025-11-27T07:57:59 1764230279

You are claiming humans copy-and-paste copyright headers without copying the corresponding code. To prove you're correct, you only need to show one (or a few) examples of it happening. To prove you incorrect, someone would have to go through all code in existence to show the absence of the phenomenon.

Hence the burden of proof is on you.

menaerus · 2025-11-27T08:47:19 1764233239

No code besides the header was copied so I am asking what is so problematic about it?

PunchyHamster · 2025-11-27T12:22:27 1764246147

that was already explained before

spookie · 2025-11-27T05:58:06 1764223086

None of that matters. The header is there, in writing, and discussed in the PR. It is acknowledged by both parties and the author gives a clumsy response for its existence. The PR is simply tainted by this alone, not to mention other pain points.

You may not consider this problematic. But maintainers of this project sure do, given this was one of the immediate concerns of theirs.

joelreymont · 2025-11-27T07:03:31 1764227011

OxCaml is a fork of OCaml, they have the same license.

I wasn't able to find any chunks of code copied wholesale from OxCaml which already has a DWARF implementation.

All that code wasn't written by Mark, AI just decided to paste his copyright all over.

menaerus · 2025-11-27T07:18:11 1764227891

It matters because it completely weakens their point of stance and make them look unreasonable. Header is irrelevant since it isn't copyright infringement, and FWIW when it has been corrected (in the MR), then they decided that the MR is too complex for them and closed the whole issue. Ridiculous.

biorach · 2025-11-27T10:11:38 1764238298

An incorrect copyright header is a major red flag for non technical reasons. If you think it is an irrelevant minor matter then you do not undesirable several very important social and legal aspects of the issue.

menaerus · 2025-11-27T10:23:55 1764239035

Social maybe yes what legal aspects? Everybody keeps repeating that but there is no copyright infringement. Maybe you can point me to one?

I understand that people are uncomfortable with this, I am likely too, but objectively looking there's technically nothing wrong or different to what humans already do.

biorach · 2025-11-27T11:29:23 1764242963

The point is that it ended up in the PR in the first place. The submitted seemed unaware of its presence and only looked into it after it was pointed out. This is sloppy and is a major red flag.

menaerus · 2025-11-27T13:20:40 1764249640

So there's no point? Sloppy maybe yes but technically incorrect or legally questionable no. Struggle is real

pepoluan · 2025-11-28T01:39:19 1764293959

If the submitter is sloppy with things that are not complicated, how can one be sure of things that ARE complicated?

menaerus · 2025-11-28T08:03:22 1764317002

The funny thing is that it works, have a look at the MR. It says:

  All existing tests pass. Additional DWARF tests verify:

  DWARF structure (DW_TAG_compile_unit, DW_TAG_subprogram).
  Breakpoints by function and line in both GDB and LLDB.
  Type information and variable visibility.
  Correct multi-object linking.
  Platform-specific relocation handling.

So the burden of proof is obviously not anymore on the MR submitter side but the other.

pjmlp · 2025-11-26T21:52:07 1764193927

Yes?

That is why some people are forbidden to contribute to projects if their eyes have read projects with incompatible licenses, in case people go to copyright court.

menaerus · 2025-11-27T07:51:54 1764229914

Yes what? Both oxcaml and ocaml have compatible LGPL licenses so I didn't get your argument.

But even if that hadn't been the case, what exactly would be the problem? Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?

biorach · 2025-11-27T10:20:00 1764238800

The wider point is that copyright headers are a very important detail and that a) the AI got it wrong b) you did not notice c) you have not taken on board the fact that it is important despite being told several times and have dismissed the issue as unimportant

Which raises the question how many other important incorrect details are buried in the 13k lines of code that you are unaware of and unable to recognise the significance of? And how much mantainer time would you waste being dismissive of the issues?

People have taken the copyright header as indicative of wider problems in the code.

menaerus · 2025-11-27T10:26:41 1764239201

Yes, please then find those for now imaginative issues and drill through them? Sorry, but I haven't seen anyone in that MR calling out for technical deficiencies so this is just crying out loud in a public for no concrete reasons.

It's the same as if your colleague sitting next to you would not allow the MR to be merged for various political and not technical reasons - this is exactly what is happening here.

biorach · 2025-11-27T11:26:27 1764242787

> Yes, please then find those for now imaginative issues and drill through them?

No, that is a massive amount of work which will only establish what we already know with a high degree of certainty due to the red flags already mentored - that this code is too flawed to begin with.

This is not political, this is looking out for warming signs in order to avoid wasting time. At this stage the burden of proof is on the submitter, not the reviewers

menaerus · 2025-11-27T13:22:25 1764249745

Too flawed? Did you miss that tiny detail that MR fixes a long time issue for ocaml? This is exactly political because there's no legal or technical issue. Only fluff by scared developers. I have no stakes in this but I'm sincerely surprised by the amount of unreasonable and unsubstantiated claims and explanations given in this thread and MR

zeratax · 2025-12-03T13:37:38 1764769058

I don't get why you do not understand why nobody wants to waste time on a MR where the author didn't even themselves have any interest on looking over it even once. https://github.com/ocaml/ocaml/pull/14369/files#diff-bc37d03... also all the unused functions...

did it fix a long time issue? maybe, but 9 tests for 13k lines doesnt give much confidence in that

and even if it worked perfectly, who will maintain this?

ahoka · 2025-11-27T11:07:41 1764241661

"Yes what? Both oxcaml and ocaml have compatible LGPL licenses so I didn't get your argument."

LGPL is a license for distribution, the copyright of the original authors is retained (unless signed away in a contribution agreement, usually to an organization).

"Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?"

This was not the case here, so not sure how that is related in any way?

menaerus · 2025-11-27T13:26:08 1764249968

Do you understand that no code besides the header copyright was copied? So what copyright exactly are you talking about?

pjmlp · 2025-11-27T08:48:58 1764233338

Depends on the license of the original material, which is why they tend to have a list of allowed use cases for copying content.

Naturally there are very flexible ones, very draconian ones, and those in the middle.

Most people get away with them, because it isn't like everyone is taking others to copyright court sessions every single day, unless there are millions at play.