A Google patent on “information gain score” was granted in June 2022. I believe it’s no coincidence that several algorithm updates – including the helpful content update – followed.
Is information gain score a key way for Google to prioritize valuable content that is “original, high-quality, people-first content demonstrating qualities E-E-A-T”?
My hypothesis: yes. Here’s why.
What is an information gain score?
An information gain score is essentially a measure of how unique your content is from the rest of the corpus. Here, the corpus would be all the potential documents that Google analyses in ranking for the particular query searched.
In the patent, most of the scenarios given to calculate information gain scores are done after subsequent queries or document views and search result views. It’s a learning process specific to the individual and/or to the topic they’re searching about.
The late Bill Slawski wrote a technical breakdown of this process when the patent was still in review in 2020.
One of the interesting things I see in the patent language is this:
Google is giving leeway for information gain scores to be calculated algorithmically and applied as training data across machine learning models.
The need for a first set of documents to calculate the information gain score may become obsolete in the future: