Google Explains Reasons For Crawled Not Indexed

By Chris Barnhart On Jul 3, 2024

Back in May Google’s Gary Illyes sat for an interview at the SERP Conf 2024 conference in Bulgaria and answered a question about the causes of crawled but not indexed, offering multiple reasons that are helpful for debugging and fixing this error.

Although the interview happened in May, the video of the interview went underreported and not many people have actually watched it. I only heard of it because the always awesome Olesia Korobka (@Giridja) recently drew attention to the interview in a Facebook post.

So even though the interview happened in May, the information is still timely and useful.

Reason For Crawled – Currently Not Indexed

Crawled Currently Not Indexed is a reference to an error report in the Google Search Console Page Indexing report which alerts that a page was crawled by Google but was not indexed.

During a live interview someone submitted a question, asking:

“Can crawled but not indexed be a result of a page being too similar to other stuff already indexed?

So is Google suggesting there is enough other stuff already and your stuff is not unique enough?”

Google’s search console documentation doesn’t provide an answer as to why Google may crawl a page and not index it, so it’s a legitimate question.

Gary Illyes answered that yes, one of the reasons could be that there is already other content that is similar. But he also goes on to say that there are other reasons, too.

He answered:

“Yeah, that that could be one thing that it can mean. Crawled but not indexed is, ideally we would break up that category into more granular chunks, but it’s super hard because of how the data internally exists.

It can be a bunch of things, dupe elimination is one of those things, where we crawl the page and then we decide to not index it because there’s already a version of that or an extremely similar version of that content available in our index and it has better signals.

But yeah, but it it can be multiple things.”

General Quality Of Site Can Impact Indexing

Gary then called attention to another reason why Google might crawl but choose not to index a site, saying that it could be a site quality issue.

Illyes then continued his answer:

“And the general quality of the of the site, that can matter a lot of how many of these crawled but not indexed you see in search console. If the number of these URLs is very high that could hint at general quality issues.

And I’ve seen that a lot since February, where suddenly we just decided that we are indexing a vast amount of URLs on a site just because …our perception of the site has changed.”

Other Reasons For Crawled Not Indexed

Gary next offered other reasons for why URLs might be crawled but not indexed, saying that it could be that Google’s perception of the site could have changed but that it could be a technical issue.

Gary explained:

“…And one possibility is that when you see that number rising, that the perception of… Google’s perception of the site has changed, that could be one thing.

But then there could also be that there was an error, for example on the site and then it served the same exact page to every single URL on the site. That could also be one of the reasons that you see that number climbing.

So yeah, there could be many things.”

Takeaways

Gary provided answers that should help debug why a web page might be crawled but not indexed by Google.

Content is similar to content already ranked in the search engine results pages (SERPs)
Exact same content exists on another site that has better signals
General site quality issues
Technical issues

Although Illyes didn’t elaborate on what he meant about another site with better signals, I’m fairly certain that he’s describing the scenario when a site syndicates its content to another site and Google chooses to rank the other site for the content and not the original publisher.

Watch Gary answer this question at the 9 minute mark of the recorded interview:

Featured Image by Shutterstock/Roman Samborskyi

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.