Google revealed details of two new crawlers that are optimized for scraping image and video content for “research and development” purposes. Although the documentation doesn’t explicitly say so, it’s presumed that there is no impact in ranking should publishers decide to block the new crawlers.
It should be noted that the data scraped by these crawlers are not explicitly for AI training data, that’s what the Google-Extended crawler is for.
GoogleOther Crawlers
The two new crawlers are versions of Google’s GoogleOther crawler that was launched in April 2023. The original GoogleOther crawler was also designated for use by Google product teams for research and development in what is described as one-off crawls, the description of which offers clues about what the new GoogleOther variants will be used for.
The purpose of the original GoogleOther crawler is officially described as:
“GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”
Two GoogleOther Variants
There are two new GoogleOther crawlers:
- GoogleOther-Image
- GoogleOther-Video
The new variants are for crawling binary data, which is data that’s not text. HTML data is generally referred to as text files, ASCII or Unicode files. If it can be viewed in a text file then it’s a text file/ASCII/Unicode file. Binary files are files that can’t be open in a text viewer app, files like image, audio, and video.
The new GoogleOther variants are for image and video content. Google lists user agent tokens for both of the new crawlers which can be used in a robots.txt for blocking the new crawlers.
1. GoogleOther-Image
User agent tokens: