Google announced last night that it is looking to develop a complementary protocol to the 30-year-old robots.txt protocol. This is because of all the new generative AI technologies Google and other companies are releasing.
This announcement comes shortly after the news around Open AI accessing paywalled content for its ChatGPT service. But I know many of you are not surprised that Google and others are exploring alternatives to robots.txt with all this generative AI technology floating around the web.
Nothing is changing today, all Google announced was that in the “coming months” they will hold discussions with the “community” to come up with new ideas for a new solution.
Google wrote, “Today, we’re kicking off a public discussion, inviting members of the web and AI communities to weigh in on approaches to complementary protocols. We’d like a broad range of voices from across web publishers, civil society, academia and more fields from around the world to join the discussion, and we will be convening those interested in participating over the coming months.”
Google added that it believes “it’s time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases.”
What this all means right now, is, I don’t know. But here are some responses to my tweet about it:
How about allowing regular expressions in robots.txt? I bet that would solve 75% of the crawl directive challenges SEOs run into.
— Eric Heiken (@EricHeiken) July 6, 2023
I think it works OK, although maybe after 30y it should become robots.xml or something since lots of stuff has been added, and structured file might be more prone to accidental errors