Trying to understand Google's guidelines for paywalled content.
My site work like this:
.paid-content element. When the paywall triggers it'll remove that element and replace it with a .paywall element that says "Please buy a subscription to continue reading our site".Currently my JSON-LD looks like this
"hasPart":[
{
"@type":"WebPageElement",
"isAccessibleForFree":false,
"cssSelector":".paid-content"
},
{
"@type":"WebPageElement",
"isAccessibleForFree":false,
"cssSelector":".paywall"
}
],
"isAccessibleForFree":false
Questions:
Should .paywall even be listed in the hasParts array? This element just says "Please buy a subscription". It doesn't contain any text which is hidden from free users.
In my case, only one of these two elements will exist on the page at any given time. Is that ok? Or will the google crawler think it's a problem if it's unable to find all of the element specified in the hasPart array?
For Google, hasPart > cssSelector is for indicating visually hidden content behind a paywall. In your example you're either completely removing content or showing all content publicly so the schema is irrelevant and unnecessary in either case.
.paywall won't be necessary because cssSelector should reference the class of an element wrapping paywalled content, not just a paywall message (which is visible to all users).
.paid-content is wrapping content that is visible to all users, which would make that schema unnecessary as well since you should only target content visually hidden behind a paywall (see below and their second example).
I'm not certain how Google would react to this schema markup not matching the DOM, but I think it might be ignored in this case since they're looking for something very specific. Having a page with no content indexed is the bigger problem here.
The point of having this paywall schema (from Googles standpoint) stems down to one major reason:
Publishers should enclose paywalled content with structured data to help Google differentiate paywalled content from the practice of cloaking, where the content served to Googlebot is different from the content served to users.
Cloaking (i.e. hiding content on a page for SEO gains) has been a big strategy used by "black hatters" for many years now. Google will penalize this practice where they can (like BMW back in 2006) and have certainly done plenty of work on their algorithms to catch this stuff automatically. Problem is - now we have paywall sites like yours, which "hide content" but for different (and less dubious) reasons.
You are not visually hiding your content though, instead you are stripping the content off of the page. The problem with this approach is that you risk Google bots also hitting the paywall and not indexing the page properly - since the content is just not there. Even if you are stripping content with JavaScript it's a risk.
That's why typical paywall sites will cover or hide content behind a CSS overlay coupled with overflow:hidden on the body. That approach probably triggers a Google red flag for cloaking and is why they're now asking people to use this (I'm just assuming that last sentence).
So taking that into consideration and looking at the Google examples from the link you provided, the cssSelector is just to say: "this content isn't some cloaking/blackhat trick, it's just paywalled, so let's still index it."
Bottom line for you is that the schema in your example doesn't matter... because either you're showing users all of the content and have nothing to prove to Google, or you're displaying a page with no content and there's no cloaking issue for Google to care about.
So if this is your thing, the rule of thumb is:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With