Google’s flexible sampling solution that replaced the first-click-free solution for gated, subscription or paywalled content launched in 2017. Since then, many publishers use the paywall structured data to communicate to Google the full content that is behind the content gate. Some are calling this solution “leaky” in which Google responded saying it is not.
Ryan Singel, a journalist covering tech business, tech policy, civil liberty and privacy issues, who has written at Wired and many other respected publications, posted a comment on this site calling this Google solution “leaky.” He said:
Google Search and Google News are stuck in the past when it comes to these. It’s crawler assumes that paywalled or reg walled content is still going to be in the HTML that Google crawler will see. In other words, it demands leaky bad tech from sites with paywalled or registration required content. It’d be great if it fixed that instead of sending Danny Sullivan out to lecture sites about their markup with directions that don’t work for a smart, modern, non-leaky publishing system.
Danny Sullivan, Google’s Search Liaison, then responded to that comment on this blog and on X and on Mastodon saying it is not leaky. Here is Danny’s response from this blog:
Our system is looking to be shown the full content, if a publisher wants to do that. If they do, we understand more about it. If we understand more, then we might be able to show it for more queries where it’s relevant. This doesn’t involve using JS to somehow “hide” the content from people who aren’t our crawler or anything like that.
Basically, you see our crawler, you show us the full content. And only us. And if you’re worried that someone is pretending to be us, then you check our publicly shared IP addresses.
Next, you markup the page so we know what’s paywalled / gated content so that we — and only we are seeing this full content — also know you aren’t trying to cloak us by targeting our crawler specifically. Since only we are seeing this, there’s nothing “leaky” as you are suggesting. Here’s the doc.
Where the “leaky” stuff tends to come in is someone might search with us, then click on the cached copy of a page to see the full thing we saw. And if that’s a concern, our guidance is to block the cached copy — covered in the docs.
I hope that helps explain this more. If I’m missing something, or you have other suggestions, honestly very happy to hear them. I found Outpost and emailed both the info and press addresses, so look for that, happy to continue the conversation.