Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to protect JSON-LD from email harvesters?

I want to use JSON-LD for SEO purposes, but not sure how to prevent an automated email harvester from picking up the address(es) from the source.

In the email schema you supply an email address. I've always obfuscated email addresses in some way by either using JS to display them, or other methods. This has helped stop spam so far.

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Person",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Seattle",
    "addressRegion": "WA",
    "postalCode": "98052",
    "streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
  },
  "colleague": [
    "http://www.xyz.edu/students/alicejones.html",
    "http://www.xyz.edu/students/bobsmith.html"
  ],
  "email": "mailto:[email protected]",
  "image": "janedoe.jpg",
  "jobTitle": "Professor",
  "name": "Jane Doe",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}
</script>

The only way I could think of doing it is using JS to dynamically create the above, which I would expect harvesters to not be able to interpret for the most part, but then that would most likely break search engine support. Is there any solution to this?

like image 478
cryptic ツ Avatar asked Dec 31 '15 02:12

cryptic ツ


2 Answers

Unless you can detect the malicious bot (and serve it a version without the email address), there is no sensible solution. One of the main reasons for using structured data is giving bots easy access, so this is by design.

You could try to make getting the email address harder:

  • Schema.org’s email property expects Text as value, so obfuscation could be used (e.g., jane-doe at {this domain}).
    Hope: bots don’t understand your obfuscation method by default.

  • If the use of Schema.org’s email property is not required: FOAF’s mbox_sha1sum property expects a SHA1 hashed email address.
    Hope: bots don’t try to (or didn’t already) find the corresponding email address.

  • You could use JavaScript to add the email property (Google supports it, for example).
    Hope: bots don’t execute JavaScript.

But this makes it harder for good bots too, of course, and at a certain point you might want to consider not providing the email address at all.

If you only want to provide the email address to certain consumers, you could serve these consumers the document that contains the email address, and all other bots the one without. But search engine bots might not like this method. And you disadvantage new consumers, or consumers you don’t know.

I would just provide the email address unobfuscated and for everyone, making the life of visitors (humans as well as bots) easier. Spam should be your problem, not theirs; and it’s a problem that can be handled.

like image 152
unor Avatar answered Nov 17 '22 03:11

unor


JSON-LD makes data readily available for robots, including email harvesters which can easily spoof identity of other bots. I suggest leaving the email addresses out of the JSON-LD, it won't hurt the SEO and owners of those emails will love you for it. Otherwise you -will- cause their email boxes to be constant target of spam

like image 35
Lukasz Korzeniowski Avatar answered Nov 17 '22 02:11

Lukasz Korzeniowski