AI Bots — Who is Blocking and Why?

0
AI Bots — Who is Blocking and Why?

I covered some of the potential arguments either way in my previous post, but the truth is that right now looking at how little traffic these models are driving, it’s probably not hugely impactful in the short term. If you look at Moz’s robots.txt file at the time of writing, you can see we block GPTBot from our learn center and blog – this is a compromise position, but one which we haven’t really seen any benefit or harm from so far, and nor would we expect to in the short term. I certainly don’t think the comparison to blocking Googlebot is fair – LLMs are primarily a content generation tool, not primarily a traffic referral tool. Indeed, Google has suggested that even their AI Overviews are not affected by Google-Extended, but instead by regular Googlebot. Similarly, at the time of writing OpenAI has just announced their direct Google competitor “SearchGPT,” and also confirmed that, like Google, it is crawling with a separate user agent to other generative AI tools – in this case, “OAI-SearchBot.”

What I didn’t cover in that article is the case of large publishers. If you are a large publisher and you do think you have leverage, and may be able to strike a deal, you may wish to set a precedent – that these tools are not owed free access unless they reach a formal arrangement. For example, The Verge’s parent company, Vox Media, publicly said they were blocking access before eventually striking a deal. The robots.txt file on theverge.com still explicitly blocks most other AI bots, but not (anymore) GPTbot.

Of course, the majority of sites and the majority of readers of this blog post are not large publishers. It may well be significantly more valuable for you to be mentioned in AI-written content than it is for you to try to protect the unique value of your content, particularly in a crowded market of competitors with no such qualms. Still, it’s interesting to see the precedents being set here, and it will be even more interesting to see how it plays out.

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More