If they ignore robots.txt, than what else gives them the right to copy and host ...

JoshTriplett · on June 12, 2019

> If they ignore robots.txt, than what else gives them the right to copy and host content from other sites?

The same thing that gives them the right otherwise: fair use, and explicit archiving exceptions written into copyright law. robots.txt adds no additional legality.

ghaff · on June 12, 2019

Fair use does not give you the right to wholesale scrape content that is otherwise under copyright with a non-CC/open license, which is effectively what the Internet Archive does. (To be clear, I approve of IA's mission but it's in a legal grey area.)

robots.txt has never had much of a legal meaning. Respecting it was mostly a defense along the lines of "You only have to ask, even retrospectively, and we won't copy your content." As a practical matter, very few are going to sue a non-profit to take down content when they pretty much only have to send an email, (almost) no questions asked.

JoshTriplett · on June 12, 2019

> Fair use does not give you the right to wholesale scrape content

Yes, it potentially does. There are court cases establishing precedent that copying something in its entirety can still be fair use, as well as law and court cases establishing specific allowances for archives/libraries/etc.

ghaff · on June 12, 2019

There's probably an argument where archiving a particular site as a whole has some compelling public interest--say a politician's campaign site. But it seems unlikely that would extend to randomly archiving (and making available to the public) web sites in general.

I've always been told that fair use--as a defense against a copyright infringement claim--is very fact dependent.

tannhaeuser · on June 12, 2019

IANAL, but I fail to see how fair use can be leveraged to give archive sites a right to host other site's content when that content is available publically und non-discriminatory, and there are eg. Creative Common license metadata tags for giving other sites explicit and specific permissions to re-host content. There are also concerns to be addressed under EU copyright reform (eg. preview of large portions of text from other sites without giving those other sites clicks). If your point is that content creators can't technically or "jurisdictionally" stop archival sites from rehosting, then the logical consequence is that content creators need to look at DRM and similar draconic measures which I hope they rather aren't forced to do.

testcross · on June 12, 2019

how does it work for all the content that is not under fair use because the author comes from a country with different laws?

JoshTriplett · on June 12, 2019

The author's jurisdiction is irrelevant. The only question is what jurisdiction's laws apply to the Internet Archive (or in general whatever party does the copying).

icebraining · on June 12, 2019

Then that country can try to enforce its laws on the Internet Archive. Won't be easy, though.