As I recall it, there was a time when copyright infringement on YouTube was so prolific that the rightsholders essentially forced creation of the first watermarking system that worked at massive scale. I do wonder if any corners of research are currently studying the attribution problem with the specific lens of licensing as its motivation
Yeah that was the old Viacom vs Youtube days. Here is a great video if you have half an hour to spare: https://www.youtube.com/watch?v=qV2h_KGno9w . Pretty funny court case where it turns out viacom was violating their OWN copyright... set a massive precedent.
But one thing this reminds me of is the idea of a "trap street", something mapmakers used to do was put in false locations on their maps to prove that other mapmakers were copying them: https://en.wikipedia.org/wiki/Trap_street . I figure you could do something similarly adversarial with AI to pollute the public training data on the internet. IDK like adversarial attacks on image classifiers https://www.youtube.com/watch?v=AOZw1tgD8dA . With an LLM you could try to make them into a manchurian candidate.
Also, the pool of public domain data is always increasing, so the AI will eventually win in any case, even if we have to wait 100 years