Google updated their Googlebot and crawler documentation to add a range of IPs for bots triggered by users of Google products. The names of the feeds switched which is important for publishers who are whitelisting Google controlled IP addresses. The change will be useful for publishers who want to block scrapers who are using Google’s cloud and other crawlers not directly associated with Google itself.
Google says that the list contains IP ranges that have long been in use, so they’re not new IP address ranges.
There are two kinds of IP address ranges:
The lists that correspond to each category are different now.
Previously the list that corresponded to Google IP addresses was this one: special-crawlers.json (resolving to gae.googleusercontent.com)
Now the “special crawlers” list corresponds to crawlers that are not controlled by Google.
“IPs in the user-triggered-fetchers.json object resolve to gae.googleusercontent.com hostnames. These IPs are used, for example, if a site running on Google Cloud (GCP) has a feature that requires fetching external RSS feeds on the request of the user of that site.”
The new list that corresponds to Google controlled crawlers is:
user-triggered-fetchers-google.json
“Tools and product functions where the end user triggers a fetch. For example, Google Site Verifier acts on the request of a user. Because the fetch was requested by a user, these fetchers ignore robots.txt rules.
Fetchers controlled by Google originate from IPs in the user-triggered-fetchers-google.json object and resolve to a google.com hostname.”
The list of IPs from Google Cloud and App crawlers that Google doesn’t control can be found here:
https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers.json
The list of IP from Google that are triggered by users and controlled by Google is here:
https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers-google.json
There is a new section of content that explains what the new list is about.
“Fetchers controlled by Google originate from IPs in the user-triggered-fetchers-google.json object and resolve to a google.com hostname. IPs in the user-triggered-fetchers.json object resolve to gae.googleusercontent.com hostnames. These IPs are used, for example, if a site running on Google Cloud (GCP) has a feature that requires fetching external RSS feeds on the request of the user of that site. ***-***-***-***.gae.googleusercontent.com or google-proxy-***-***-***-***.google.com user-triggered-fetchers.json and user-triggered-fetchers-google.json”
Google’s changelog explained the changes like this:
“Exporting an additional range of Google fetcher IP addresses
What: Added an additional list of IP addresses for fetchers that are controlled by Google products, as opposed to, for example, a user controlled Apps Script. The new list, user-triggered-fetchers-google.json, contains IP ranges that have been in use for a long time.Why: It became technically possible to export the ranges.”
Read the updated documentation:
Verifying Googlebot and other Google crawlers
Read the old documentation:
Archive.org – Verifying Googlebot and other Google crawlers
Featured Image by Shutterstock/JHVEPhoto
Google revealed details of two new crawlers that are optimized for scraping image and video…
Here is a recap of what happened in the search forums today, through the eyes…
YouTube unveiled four new content and ad offerings at its 13th annual Brandcast at David…
What Is Direct Traffic in Google Analytics? Direct traffic in Google Analytics 4 (GA4) refers to…
Google looks like it will discontinue the direct ordering option with the Order with Google…
Google Ads continues to roll out AI features within the advertiser console. Now some advertisers…
This website uses cookies.
Leave a Comment