Google quietly added a new bot to their crawler documentation that crawls on behalf of commercial clients of their Vertex AI product. It appears that the new crawler may only crawl sites controlled by the site owners, but the documentation isn’t entirely clear on that point.
Vertex AI Agents
Google-CloudVertexBot, the new crawler, ingests website content for Vertex AI clients, unlike other bots listed in the Search Central documentation that are tied to Google Search or advertising.
The official Google Cloud documentation offers the following information:
“In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data.”
It goes on to list six types of data, one of which is public website data. On crawling the documentation says that there are two kinds of website crawling with limitations specific to each kind.
- Basic website indexing
- Advanced website indexing
Documentation Is Confusing
The documentation explains website data:
“A data store with website data uses data indexed from public websites. You can provide a set of domains and set up search or recommendations over data crawled from the domains. This data includes text and images tagged with metadata.”
The above description doesn’t say anything about verifying domains. The description of Basic website indexing doesn’t say anything about site owner verification either.
But the documentation for Advanced website indexing does say that domain verification is required and also imposes indexing quotas.