Wikimedia's recent post completely misses the mark. What they're experiencing isn't merely bulk data collection – it's the unauthorized transformation of their content infrastructure into a free API service for commercial AI tools.
It's not crawling for training that is the issue...and it's an over simplification stating that AI companies are "training" on someone's data.
When systems like Claude and ChatGPT fetch Wikimedia content to answer user queries in real time, they're effectively using Wikimedia as an API – with zero compensation, zero attribution, and zero of the typical API management that would come with such usage. Each time a user asks these AI tools a question, they may trigger fresh calls to Wikimedia servers, creating a persistent, on-demand load rather than a one-time scraping event.
The distinction is crucial. Traditional search engines like Google crawl content, index it, and then send users back to the original site. These AI systems instead extract the value without routing any traffic back, breaking the implicit value exchange that has sustained the web ecosystem.
Wikimedia's focus on technical markers of "bot behavior" – like not interpreting JavaScript or accessing uncommon pages – shows they're still diagnosing this as a traditional crawler problem rather than recognizing the fundamental economic imbalance. They're essentially subsidizing commercial AI products with volunteer-created content and donor-funded infrastructure.
The solution has been available all along. HTTP 402 "Payment Required" was built into the web's foundation for exactly this scenario. Combined with the Lightning Network's micropayment capabilities and the L402 protocol implementation, Wikimedia could:
- Keep content free for human users
- Charge AI services per request (even fractions of pennies would add up)
- Generate sustainable infrastructure funding from commercial usage
- Maintain their open knowledge mission while ending the effective subsidy
Tools like Aperture make implementation straightforward – a reverse proxy that distinguishes between human and automated access, applying appropriate pricing models to each.
Instead of leading the way toward a sustainable model for knowledge infrastructure in the AI age, Wikimedia is writing blog posts about traffic patterns. If your content is being used as an API, the solution is to become an API – with all the management, pricing, and terms that entails. Otherwise, they'll continue watching their donor resources drain away to support commercial AI inference costs.
I suspect several factors contribute to this resistance:
Ideological attachment to "free" as binary rather than nuanced: Many organizations have built their identity around offering "free" content, creating a false dichotomy where any monetization feels like betrayal of core values. They miss that selective monetization (humans free, automated commercial use paid) could actually strengthen their core mission.
Technical amnesia: The web's architects built payment functionality into HTTP from the beginning, but without a native digital cash system, it remained dormant. Now that Bitcoin and Lightning provide the missing piece, there's institutional amnesia about this intended functionality.
Complexity aversion: Implementing new payment systems feels like adding complexity, when in reality it simplifies the entire ecosystem by aligning incentives naturally rather than through increasingly byzantine rate-limiting and bot-detection schemes.
The comfort of complaint: There's a certain organizational comfort in having identifiable "villains" (bots, crawlers, etc.) rather than embracing solutions that might require internal change. Blog posts lamenting crawler impacts are easier than implementing new systems.
False democratization concerns: Some worry that payment systems would limit access to those with means, missing that micropayments precisely enable democratization by allowing anyone to pay exactly for what they use without arbitrary gatekeeping.
The irony is that Wikimedia already pays for the cost of serving pages — it's just invisible to users because donors cover it. Micropayments via Lightning aren't about "charging for knowledge," they're about sustainable access models in the face of high-frequency bot loads (especially from AI). If AI crawlers are consuming massive resources, it's not unreasonable to explore accountability — not for readers, but for automated extractors.
And, even better, those micropayments could be shared with the volunteers. How about a big party for them, or gifts on Amazon for good behavior? How about a simple birthday card? There's a lot that can be done with resources like this!
Because I evolved the prompt from something that was off to something that made sense. I don't see using any resource as a problem as long as the content is bang on.
The "prompt" as you call it, isn't a sinlge prompt. It's a long discussion that includes the article (which I copy pasta'd) and other references I've worked on recently (I write crawlers and have for years)
It's not crawling for training that is the issue...and it's an over simplification stating that AI companies are "training" on someone's data.
When systems like Claude and ChatGPT fetch Wikimedia content to answer user queries in real time, they're effectively using Wikimedia as an API – with zero compensation, zero attribution, and zero of the typical API management that would come with such usage. Each time a user asks these AI tools a question, they may trigger fresh calls to Wikimedia servers, creating a persistent, on-demand load rather than a one-time scraping event.
The distinction is crucial. Traditional search engines like Google crawl content, index it, and then send users back to the original site. These AI systems instead extract the value without routing any traffic back, breaking the implicit value exchange that has sustained the web ecosystem.
Wikimedia's focus on technical markers of "bot behavior" – like not interpreting JavaScript or accessing uncommon pages – shows they're still diagnosing this as a traditional crawler problem rather than recognizing the fundamental economic imbalance. They're essentially subsidizing commercial AI products with volunteer-created content and donor-funded infrastructure.
The solution has been available all along. HTTP 402 "Payment Required" was built into the web's foundation for exactly this scenario. Combined with the Lightning Network's micropayment capabilities and the L402 protocol implementation, Wikimedia could:
Tools like Aperture make implementation straightforward – a reverse proxy that distinguishes between human and automated access, applying appropriate pricing models to each.Instead of leading the way toward a sustainable model for knowledge infrastructure in the AI age, Wikimedia is writing blog posts about traffic patterns. If your content is being used as an API, the solution is to become an API – with all the management, pricing, and terms that entails. Otherwise, they'll continue watching their donor resources drain away to support commercial AI inference costs.
I suspect several factors contribute to this resistance:
Ideological attachment to "free" as binary rather than nuanced: Many organizations have built their identity around offering "free" content, creating a false dichotomy where any monetization feels like betrayal of core values. They miss that selective monetization (humans free, automated commercial use paid) could actually strengthen their core mission.
Technical amnesia: The web's architects built payment functionality into HTTP from the beginning, but without a native digital cash system, it remained dormant. Now that Bitcoin and Lightning provide the missing piece, there's institutional amnesia about this intended functionality.
Complexity aversion: Implementing new payment systems feels like adding complexity, when in reality it simplifies the entire ecosystem by aligning incentives naturally rather than through increasingly byzantine rate-limiting and bot-detection schemes.
The comfort of complaint: There's a certain organizational comfort in having identifiable "villains" (bots, crawlers, etc.) rather than embracing solutions that might require internal change. Blog posts lamenting crawler impacts are easier than implementing new systems.
False democratization concerns: Some worry that payment systems would limit access to those with means, missing that micropayments precisely enable democratization by allowing anyone to pay exactly for what they use without arbitrary gatekeeping.