It's clear Claude adapted code directly from the OxCaml implementation (the PR author said he pointed Claude at that code [1] and then provides a ChatGPT analysis [2] that really highlights the plagiarism, but ultimately comes to the conclusion that it isn't plagiarized).
Either that highlights someone who is incompetent or they are willfully being blasé. Neither bodes well for contributing code while respecting copyright (though mixing and matching code on your own private repo that isn't distributed in source or binary form seems reasonable to me).
Sorry, this is just ridicilous and shows how people fragile really are. This whole topic and whole MR as well.
I am routinely looking into the folly implementation, sometimes into the libstdc++, sometimes into libc++, sometimes into boost or abseil etc. to find inspiration for problems that I tackle in other codebases. By the same standards, this should also be plagiarism, no? I manufacture new ideas by compiling existing knowledge from elsewhere. Literally every engineer in the world does the same. Why is AI any different?
Perhaps because the AI assigned copyright in the files to the author of the library it copied from and the person prompting it told it to look at that library. Without even getting into the comedy AI generated apologia to go with it which makes it look worse rather than better.
From a pragmatic viewpoint as an engineer you assign the IP you create over to the company you work for so plagarism has real world potential to lose you your job at best. There's a difference between taking inspiration from something unrelated "oh this is a neat algorithmic approach to solving this class of problems" to "I need to implement this specific feature and it exists in this library so I'll lift it nearly verbatim".
Can you give an example what exactly was copied? I ask because I took a look into MR and original repo, and the conclusion is that the tool only copy-pasted the copyright header but not the code. So I am still wondering - what's wrong with that (it's a silly mistake even a human can make), and where is the copyright infringement everyone is talking about?
You are claiming humans copy-and-paste copyright headers without copying the corresponding code. To prove you're correct, you only need to show one (or a few) examples of it happening. To prove you incorrect, someone would have to go through all code in existence to show the absence of the phenomenon.
None of that matters. The header is there, in writing, and discussed in the PR. It is acknowledged by both parties and the author gives a clumsy response for its existence. The PR is simply tainted by this alone, not to mention other pain points.
You may not consider this problematic. But maintainers of this project sure do, given this was one of the immediate concerns of theirs.
It matters because it completely weakens their point of stance and make them look unreasonable. Header is irrelevant since it isn't copyright infringement, and FWIW when it has been corrected (in the MR), then they decided that the MR is too complex for them and closed the whole issue. Ridiculous.
An incorrect copyright header is a major red flag for non technical reasons. If you think it is an irrelevant minor matter then you do not undesirable several very important social and legal aspects of the issue.
Social maybe yes what legal aspects? Everybody keeps repeating that but there is no copyright infringement. Maybe you can point me to one?
I understand that people are uncomfortable with this, I am likely too, but objectively looking there's technically nothing wrong or different to what humans already do.
The point is that it ended up in the PR in the first place. The submitted seemed unaware of its presence and only looked into it after it was pointed out. This is sloppy and is a major red flag.
The funny thing is that it works, have a look at the MR. It says:
All existing tests pass. Additional DWARF tests verify:
DWARF structure (DW_TAG_compile_unit, DW_TAG_subprogram).
Breakpoints by function and line in both GDB and LLDB.
Type information and variable visibility.
Correct multi-object linking.
Platform-specific relocation handling.
So the burden of proof is obviously not anymore on the MR submitter side but the other.
That is why some people are forbidden to contribute to projects if their eyes have read projects with incompatible licenses, in case people go to copyright court.
Yes what? Both oxcaml and ocaml have compatible LGPL licenses so I didn't get your argument.
But even if that hadn't been the case, what exactly would be the problem? Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?
The wider point is that copyright headers are a very important detail and that a) the AI got it wrong b) you did not notice c) you have not taken on board the fact that it is important despite being told several times and have dismissed the issue as unimportant
Which raises the question how many other important incorrect details are buried in the 13k lines of code that you are unaware of and unable to recognise the significance of? And how much mantainer time would you waste being dismissive of the issues?
People have taken the copyright header as indicative of wider problems in the code.
Yes, please then find those for now imaginative issues and drill through them? Sorry, but I haven't seen anyone in that MR calling out for technical deficiencies so this is just crying out loud in a public for no concrete reasons.
It's the same as if your colleague sitting next to you would not allow the MR to be merged for various political and not technical reasons - this is exactly what is happening here.
> Yes, please then find those for now imaginative issues and drill through them?
No, that is a massive amount of work which will only establish what we already know with a high degree of certainty due to the red flags already mentored - that this code is too flawed to begin with.
This is not political, this is looking out for warming signs in order to avoid wasting time. At this stage the burden of proof is on the submitter, not the reviewers
Too flawed? Did you miss that tiny detail that MR fixes a long time issue for ocaml? This is exactly political because there's no legal or technical issue. Only fluff by scared developers. I have no stakes in this but I'm sincerely surprised by the amount of unreasonable and unsubstantiated claims and explanations given in this thread and MR
I don't get why you do not understand why nobody wants to waste time on a MR where the author didn't even themselves have any interest on looking over it even once.
https://github.com/ocaml/ocaml/pull/14369/files#diff-bc37d03...
also all the unused functions...
did it fix a long time issue? maybe, but 9 tests for 13k lines doesnt give much confidence in that
and even if it worked perfectly, who will maintain this?
"Yes what? Both oxcaml and ocaml have compatible LGPL licenses so I didn't get your argument."
LGPL is a license for distribution, the copyright of the original authors is retained (unless signed away in a contribution agreement, usually to an organization).
"Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?"
This was not the case here, so not sure how that is related in any way?
Depends on the license of the original material, which is why they tend to have a list of allowed use cases for copying content.
Naturally there are very flexible ones, very draconian ones, and those in the middle.
Most people get away with them, because it isn't like everyone is taking others to copyright court sessions every single day, unless there are millions at play.
Either that highlights someone who is incompetent or they are willfully being blasé. Neither bodes well for contributing code while respecting copyright (though mixing and matching code on your own private repo that isn't distributed in source or binary form seems reasonable to me).
[1]: https://github.com/ocaml/ocaml/pull/14369#issuecomment-35573...
[2]: https://github.com/ocaml/ocaml/pull/14369#issuecomment-35566...