If they ignore robots.txt, than what else gives them the right to copy and host content from other sites? As much as I value Wayback and archive.org, I think putting this into the realm of bilateral negotiation and a DMCA-like model outside courts is a slippery slope. It's a non-solution potentially breeding new monopolies, like Google's exclusive relations with news publishers is doing. Is there nothing in HTML metadata (schema.org etc.) informing crawlers and users about usage rights that could be lifted or extended for this purpose now, especially now that the EU copyright reform has set a legal framework and recognition of principles in the attention economy?
> If they ignore robots.txt, than what else gives them the right to copy and host content from other sites?
The same thing that gives them the right otherwise: fair use, and explicit archiving exceptions written into copyright law. robots.txt adds no additional legality.
Fair use does not give you the right to wholesale scrape content that is otherwise under copyright with a non-CC/open license, which is effectively what the Internet Archive does. (To be clear, I approve of IA's mission but it's in a legal grey area.)
robots.txt has never had much of a legal meaning. Respecting it was mostly a defense along the lines of "You only have to ask, even retrospectively, and we won't copy your content." As a practical matter, very few are going to sue a non-profit to take down content when they pretty much only have to send an email, (almost) no questions asked.
> Fair use does not give you the right to wholesale scrape content
Yes, it potentially does. There are court cases establishing precedent that copying something in its entirety can still be fair use, as well as law and court cases establishing specific allowances for archives/libraries/etc.
There's probably an argument where archiving a particular site as a whole has some compelling public interest--say a politician's campaign site. But it seems unlikely that would extend to randomly archiving (and making available to the public) web sites in general.
I've always been told that fair use--as a defense against a copyright infringement claim--is very fact dependent.
IANAL, but I fail to see how fair use can be leveraged to give archive sites a right to host other site's content when that content is available publically und non-discriminatory, and there are eg. Creative Common license metadata tags for giving other sites explicit and specific permissions to re-host content. There are also concerns to be addressed under EU copyright reform (eg. preview of large portions of text from other sites without giving those other sites clicks).
If your point is that content creators can't technically or "jurisdictionally" stop archival sites from rehosting, then the logical consequence is that content creators need to look at DRM and similar draconic measures which I hope they rather aren't forced to do.
The author's jurisdiction is irrelevant. The only question is what jurisdiction's laws apply to the Internet Archive (or in general whatever party does the copying).