Google has published a fresh installment of its educational video series “How Search Works,” explaining how its search engine discovers and accesses web pages through crawling.
Google Engineer Details Crawling Process
In the seven-minute episode hosted by Google Analyst Gary Illyes, the company provides an in-depth look at the technical aspects of how Googlebot—the software Google uses to crawl the web—functions.
Illyes outlines the steps Googlebot takes to find new and updated content across the internet’s trillions of webpages and make them searchable on Google.
Illyes explains:
“Most new URLs Google discovers are from other known pages that Google previously crawled.
You can think about a news site with different category pages that then link out to individual news articles.
Google can discover most published articles by revisiting the Category page every now and then and extracting the URLs that lead to the articles.”
How Googlebot Crawls the Web
Googlebot starts by following links from known webpages to uncover new URLs, a process called URL discovery.
It avoids overloading sites by crawling each one at a unique, customized speed based on server response times and content quality.