Skip to main content

Reddit blocks non-profit Wayback Machine from archiving the site

The Internet Archive’s Wayback Machine is one of the most valuable free services available on the web, ensuring that important sources of information are protected from the vicissitudes of fate and tech companies.

Until recently, the archive was able to capture the entirety of Reddit, but that is no longer the case following new restrictions implemented by the for-profit community discussion platform …

The Internet Archive

The archive has been in operation since 1996.

We began in 1996 by archiving the Internet itself, a medium that was just beginning to grow in use. Like newspapers, the content published on the web was ephemeral – but unlike newspapers, no one was saving it. Today we have 28+ years of web history accessible through the Wayback Machine and we work with 1,200+ library and other partners through our Archive-It program to identify important web pages.

To date, it has archived 835 billion web pages, alongside books, audio recordings, photos, videos, photos, and apps. It is used by millions of people a day, from researchers and historians to the general public.

Reddit blocks Wayback Machine

Engadget reports that Reddit is almost completely blocking the Wayback Machine from crawling content on the platform.

The company has begun to place new restrictions on what the archive site will be able to access in a move that will significantly limit the Wayback Machine’s ability to preserve information from Reddit.

With the change, the Wayback Machine, a project run by the nonprofit Internet Archive, will only be able to crawl Reddit’s homepage. It will no longer be able to access comments, subreddit pages, post details, profiles and other data.

This is despite the fact that Reddit said last year that it would not block good faith actors, specifically including the Internet Archive within this.

Along with our updated robots.txt file, we will continue rate-limiting and/or blocking unknown bots and crawlers from accessing reddit.com. This update shouldn’t impact the vast majority of folks who use and enjoy Reddit. Good faith actors – like researchers and organizations such as the Internet Archive – will continue to have access to Reddit content for non-commercial use.

All stems from monetizing user content

The restrictions are the latest in a growing move by Reddit to sell access to user content while blocking free access to it. The focus on monetization was driven by the company’s IPO.

Google pays Reddit more than $60 million a year to access user content to help train its AI models, and a similar deal was struck with OpenAI. Following the conclusion of the Google deal, Reddit started blocking all other search engines.

It’s been speculated that some AI companies may have been indirectly scraping content from Reddit via the Wayback Machine, and that this may have driven the new restrictions.

Reddit had previously introduced radical API changes that killed third-party apps, resulting in widespread protests by moderators and users. The company had also confirmed plans for paid subreddits, but for now these are on hold.

Highlighted accessories

Image: 9to5Mac modification of Reddit image

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Mac — experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Lovejoy Ben Lovejoy

Ben Lovejoy is a British technology writer and EU Editor for 9to5Mac. He’s known for his op-eds and diary pieces, exploring his experience of Apple products over time, for a more rounded review. He also writes fiction, with two technothriller novels, a couple of SF shorts and a rom-com!


Ben Lovejoy's favorite gear