Categories: SEO

Googlebot Crawls & Indexes First 15 MB HTML Content

In an update to Googlebot’s help document, Google quietly announced it will crawl the first 15 MB of a webpage. Anything after this cutoff will not be included in rankings calculations.

Google specifies in the help document:

“Any resources referenced in the HTML such as images, videos, CSS and JavaScript are fetched separately. After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing. The file size limit is applied on the uncompressed data.”

This left some in the SEO community wondering if this meant Googlebot would completely disregard text that fell below images at the cutoff in HTML files.

“It’s specific to the HTML file itself, like it’s written,” John Mueller, Google Search Advocate, clarified via Twitter. “Embedded resources/content pulled in with IMG tags is not a part of the HTML file.”

What This Means For SEO

To ensure it is weighted by Googlebot, important content must now be included near the top of webpages. This means code must be structured in a way that puts the SEO-relevant information with the first 15 MB in an HTML or supported text-based file.

It also means images and videos should be compressed not be encoded directly into the HTML, whenever possible.

SEO best practices currently recommend keeping HTML pages to 100 KB or less, so many sites will be unaffected by this change. Page size can be checked with a variety of tools, including Google Page Speed Insights.

In theory, it may sound worrisome that you could potentially have content on a page that doesn’t get used for indexing. In practice, however, 15MB is a considerably large amount of HTML.

As Google states, resources such as images and videos are fetched separately. Based on Google’s wording, it sounds like this 15MB cutoff applies to HTML only.

It would be difficult to go over that limit with HTML unless you were publishing entire books’ worth of text on a single page.

Should you have pages that exceed 15MB of HTML it’s likely you have underlying issues that need to be fixed anyway.


Source: Google Search Central
Featured Image: SNEHIT PHOTO/Shutterstock

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@searchenginecodex.com. The content will be deleted within 24 hours.

Share
Chris Barnhart

Leave a Comment
Published by
Chris Barnhart

Recent Posts

Offline For Last Days Of Passover 5784

This is a programming note that I will be completely offline for the last days…

April 29, 2024

Studio By WordPress & Other Free Tools

WordPress announced the rollout of Studio by WordPress, a new local development tool that makes…

April 28, 2024

Big Update To Google’s Ranking Drop Documentation

Google updated their guidance with five changes on how to debug ranking drops. The new…

April 27, 2024

Google March 2024 Core Update Officially Completed A Week Ago

Google has officially completed its March 2024 Core Update, ending over a month of ranking…

April 27, 2024

Daily Search Forum Recap: April 26, 2024

Here is a recap of what happened in the search forums today, through the eyes…

April 27, 2024

Google March 2024 Core Update Finished April 19, 2024

The Google March 2024 core update finished a week ago and Google did not tell…

April 27, 2024