[Glass Wings] Blocking The Internet Archive Won’t Stop AI, But It Will Erase The Web’s Historical Record

<https://www.techdirt.com/2026/03/26/blocking-the-internet-archive-wont-stop-ai-but-it-will-erase-the-webs-historical-record/>

"Imagine a newspaper publisher announcing it will no longer allow libraries to
keep copies of its paper.

That’s effectively what’s begun happening online in the last few months. The
Internet Archive—the world’s largest digital library—has preserved newspapers
since it went online in the mid-1990s. The Archive’s mission is to preserve the
web and make it accessible to the public. To that end, the organization
operates the Wayback Machine, which now contains more than one trillion
archived web pages and is used daily by journalists, researchers, and courts.

But in recent months The New York Times began blocking the Archive from
crawling its website, using technical measures that go beyond the web’s
traditional robots.txt rules. That risks cutting off a record that historians
and journalists have relied on for decades. Other newspapers, including The
Guardian, seem to be following suit.

For nearly three decades, historians, journalists, and the public have relied
on the Internet Archive to preserve news sites as they appeared online. Those
archived pages are often the only reliable record of how stories were
originally published. In many cases, articles get edited, changed, or
removed—sometimes openly, sometimes not. The Internet Archive often becomes the
only source for seeing those changes. When major publishers block the Archive’s
crawlers, that historical record starts to disappear.

The Times says the move is driven by concerns about AI companies scraping
news content. Publishers seek control over how their work is used, and
several—including the Times—are now suing AI companies over whether training
models on copyrighted material violates the law. There’s a strong case that
such training is fair use.

Whatever the outcome of those lawsuits, blocking nonprofit archivists is the
wrong response. Organizations like the Internet Archive are not building
commercial AI systems. They are preserving a record of our history. Turning off
that preservation in an effort to control AI access could essentially torch
decades of historical documentation over a fight that libraries like the
Archive didn’t start, and didn’t ask for.

If publishers shut the Archive out, they aren’t just limiting bots. They’re
erasing the historical record."

Cheers,
       *** Xanni ***
--
mailto:xanni@xanadu.net               Andrew Pam
http://xanadu.com.au/                 Chief Scientist, Xanadu
https://glasswings.com.au/            Partner, Glass Wings
https://sericyb.com.au/               Manager, Serious Cybernetics

Blocking The Internet Archive Won’t Stop AI, But It Will Erase The Web’s Historical Record

Sat, 28 Mar 2026 03:20:05 +1100

Andrew Pam <xanni [at] glasswings.com.au>