Creative Commons announces preliminary support for AI 'pay-to-crawl' systems

After announcing earlier this year a framework for an open AI ecosystem, the nonprofit Creative Commons has championed “pay-to-crawl” technology — a system to automate compensation for website content when accessed by machines, such as AI web crawlers.

Creative Commons (CC) is best known for spearheading the licensing movement that allows creators to share their works while retaining copyright. In July, the organization announced a plan to provide a legal and technical framework for dataset sharing between companies that control the data and the AI providers that want to train on it.

Now, the nonprofit is tentatively backing pay-to-crawl systems, saying it is “cautiously supportive.”

“Implemented responsibly, pay-to-crawl can represent a way for websites to sustain the creation and sharing of their content and manage substitute uses, keeping content publicly available where it would otherwise not be shared or would disappear behind even more restrictive paywalls,” a CC blog post says.

With companies like Cloudflare leading the way, the idea behind pay-to-crawl would be to charge AI bots every time they scrape a website to collect its content for model training and updates.

In the past, websites freely allowed web crawlers to index their content for inclusion in search engines like Google. They benefit from this scheme by seeing their websites listed in the search results, which drives visitors and clicks. With AI technology, however, the dynamic has changed. After a consumer gets their answer via an AI chatbot, they are unlikely to click through to the source.

This shift has already been devastating for publishers by killing search traffic, and it shows no signs of letting up.

A pay-to-crawl system, on the other hand, could help publishers recover from the hit AI has had on their bottom line. Plus, it could work better for smaller web publishers who don’t have the urge to negotiate one-off content deals with AI providers. Major agreements have been concluded between companies such as OpenAI and Condé Nast, Axel Springer and others; as well as between Perplexity and Gannett; Amazon and The New York Times; and Meta and various media publishers, i.a.

The CC offered several caveats to its support for pay-to-crawl, noting that such systems could concentrate power on the Web. It could also potentially block access to content for “researchers, nonprofits, cultural heritage institutions, educators and other actors working in the public interest.”

It proposed a number of principles for responsible pay-to-crawl, including not making pay-to-crawl a default setting for all websites and avoiding blanket rules for the web. Additionally, it said pay-to-crawl systems should allow for throttling, not just blocking, and should preserve public interest. They must also be open, interoperable and built with standardized components.

Cloudflare isn’t the only company investing in the pay-to-crawl space.

Microsoft is also building an AI marketplace for publishers, and smaller startups like ProRata.ai and TollBit have also begun to do so. Another group called the RSL Collective announced its own specification for a new standard called Really Simple Licensing (RSL) that would dictate what parts of a website crawlers could access, but would stop short of blocking the crawlers. Cloudflare, Akamai and Fastly have since adopted RSL, which is backed by Yahoo, Ziff Davis, O’Reilly Media and others.

CC was also among those announcing its support for RSL alongside CC Signals, its wider project to develop technology and tools for the AI era.