Google Quietly Launches New AI Crawler

0
Google Quietly Launches New AI Crawler

Google quietly added a new bot to their crawler documentation that crawls on behalf of commercial clients of their Vertex AI product. It appears that the new crawler may only crawl sites controlled by the site owners, but the documentation isn’t entirely clear on that point.

Vertex AI Agents

Google-CloudVertexBot, the new crawler, ingests website content for Vertex AI clients, unlike other bots listed in the Search Central documentation that are tied to Google Search or advertising.

The official Google Cloud documentation offers the following information:

“In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data.”

It goes on to list six types of data, one of which is public website data. On crawling the documentation says that there are two kinds of website crawling with limitations specific to each kind.

  1. Basic website indexing
  2. Advanced website indexing

Documentation Is Confusing

The documentation explains website data:

“A data store with website data uses data indexed from public websites. You can provide a set of domains and set up search or recommendations over data crawled from the domains. This data includes text and images tagged with metadata.”

The above description doesn’t say anything about verifying domains. The description of Basic website indexing doesn’t say anything about site owner verification either.

But the documentation for Advanced website indexing does say that domain verification is required and also imposes indexing quotas.

However, the documentation for the crawler itself says that the new crawler crawls on the “site owners’ request” so it may be that it won’t come crawling public sites.

Now here’s the confusing part, the Changelog notation for this new crawler indicates that the new crawler could come to scrape your site.

Here’s what the changelog says:

“The new crawler was introduced to help site owners identify the new crawler traffic.”

New Google Crawler

The new crawler is called Google-CloudVertexBot.

This is the new information on it:

“Google-CloudVertexBot crawls sites on the site owners’ request when building Vertex AI Agents.

User agent tokens

  • Google-CloudVertexBot
  • Googlebot”

User agent substring
Google-CloudVertexBot

Unclear Documentation

The documentation seems to indicate that the new crawler doesn’t index public sites but the changelog indicates that it was added so that site owners can identify traffic from the new crawler. Should you block the new crawler with a robots.txt just in case? It’s not unreasonable to consider given that the documentation is fairly unclear on whether it only crawls domains that are verified to be under the control of the entity initiating the crawl.

Read Google’s new documentation:

Google-CloudVertexBot

Featured Image by Shutterstock/ShotPrime Studio

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

0
9370 posts 0 comments
You might also like More from author
Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More