Two trends have impacted how Google goes about indexing. While the open web has shrunk, Google needs to crawl through big content platforms like YouTube, Reddit, and TikTok, which are often built on “complex” JS frameworks, to find new content. At the same time, AI is changing the underlying dynamics of the web by making mediocre and poor content redundant.
In my work with some of the biggest sites on the web, I lately noticed an inverse relationship between indexed pages and organic traffic. More pages are not automatically bad but often don’t meet Google’s quality expectations. Or, in better terms, the definition of quality has changed. The stakes for SEOs are high: expand too aggressively, and your whole domain might suffer. We need to change our mindset about quality and develop monitoring systems that help us understand domain quality on a page level.
Satiated
Google has changed how it treats domains, starting around October 2023: No example showed the inverse relationship before October. Also, Google had indexing issues when they launched the October 2023 Core algorithm update, just as it happened now during the August 2024 update.
Before the change, Google indexed everything and prioritized the highest-quality content on a domain. Think about it like gold panning, where you fill a pan with gravel, soil and water and then swirl and stir until only valuable material remains.
Now, a domain and its content need to prove themselves before Google even tries to dig for gold. If the domain has too much low-quality content, Google might index only some pages or none at all in extreme cases.
One example is doordash.com, which added many pages over the last 12 months and lost organic traffic in the process. At least some, maybe all, of the new pages didn’t meet Google’s quality expectations.
But why? What changed? I reason that:
- Google wants to save resources and costs as the company moves to an operational efficiency state of mind.
- Partial indexing is more effective against low-quality content and spam. Instead of indexing and then trying to rank new pages of a domain, Google observes the overall quality of a domain and handles new pages with corresponding skepticism.
- If a domain repeatedly produces low-quality content, it doesn’t get a chance to pollute Google’s index further.
- Google’s bar for quality has increased because there is so much more content on the web, but also to optimize its index for RAG (grounding AI Overviews) and train models.
This emphasis on domain quality as a signal means you have to change the way to monitor your website to account for quality. My guiding principle: “If you can’t add anything new or better to the web, it’s likely not good enough.”
Quality Food
Domain quality is my term for describing the ratio of indexed pages meeting Google’s quality standard vs. not. Note that only indexed pages count for quality. The maximum percentage of “bad” pages before Google reduces traffic to a domain is unclear, but we can certainly see when its met: