Why are news websites blocking AI crawlers?
- Project Tailwind Research Team
- Mar 4, 2024
- 2 min read
A recent study by the Reuters Institute for the Study of Journalism (RISJ) found that nearly half of the top news websites in ten countries are blocking AI crawlers from accessing their content1. AI crawlers are programs that automatically collect data from the web for various purposes, such as training large language models (LLMs) or retrieving information in real time. Some examples of LLMs are ChatGPT and Gemini, which can generate natural language outputs and answer questions from users.
The study examined the websites of 200 news outlets across Argentina, Brazil, France, Germany, India, Mexico, Poland, South Africa, Spain, and the US, and checked whether they were blocking the AI crawlers of OpenAI (the company behind ChatGPT) and Google (the company behind Gemini).
The results showed that:
By the end of 2023, 48% of the news websites were blocking OpenAI’s crawlers, and 24% were blocking Google’s AI crawler.
Almost every website that blocked Google’s AI crawler also blocked OpenAI’s crawlers (97%).
The proportion of news websites that blocked AI crawlers varied considerably by country, ranging from 79% in the US to 20% in Mexico and Poland for OpenAI, and from 60% in Germany to 7% in Poland and Spain for Google.
News outlets with a relatively large online news reach were slightly more likely to block AI crawlers than those with a relatively small reach.
All types of news outlets were blocking, but the websites of legacy print publications were more likely to block than those of broadcasters or digital-born outlets.
The study did not investigate the reasons why news websites are blocking AI crawlers, but some possible explanations are:
News publishers may want to protect their intellectual property rights and prevent unauthorized use of their content by AI companies or users.
News publishers may want to avoid potential legal or ethical issues that may arise from the use of their content by AI systems, such as misrepresentation, bias, or harm.
News publishers may want to negotiate with AI companies for compensation or collaboration, rather than allowing free access to their content.
News publishers may want to maintain control over their content and audience, and avoid losing traffic or revenue to AI platforms or services.
The study suggests that news publishers are more likely to block AI crawlers than popular websites in general, and that the trend is likely to continue or increase in the future. This may have implications for the development and performance of AI systems, as well as for the access and availability of information on the web. The study calls for more research and dialogue on the topic, and for more transparency and accountability from both news publishers and AI companies.
Commenti