Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to protect e-mail addresses on a website from modern day JS-enabled bots?

This is a recurring question on the website, but after spending 20 minutes browsing through old questions I was unable to find a modern day solution.

I've previously employed this JS-based method to protect addresses. Before the JS-method I was using image and flash-based solutions. Below is my old way.

Animated example codepen: http://codepen.io/anon/pen/kIjKe/

HTML:

<span class="reverse eml">moc.niamod@tset</span><br>

CSS:

.reverse {
  unicode-bidi: bidi-override;
  direction: rtl;
}

.eml {
  display: inline;
}

JS:

function reverseEmails() {
  if (jQuery(".eml.reverse").length > 0) {
    jQuery(".eml.reverse").each(function() {
      var that  = jQuery(this);
      var email = that.text().split("").reverse().join("");
      that.removeClass("reverse");
      that.html("<a href='mailto:" + email + "'>" + email + "</a>");
    });
  }
}

None of these methods seem to work nowadays, since Node.js based scrapers are able to generate an image of the page they are scraping, then reading any human-readable data from said image - you can guess the rest.

Is there any method that works nowadays, in which users are still able to easily read / click / copy paste e-mail adresses, but JS-enabled bots could not?

like image 556
red Avatar asked Mar 05 '15 14:03

red


People also ask

How do I hide email bots on my website?

By far, the easiest way to hide your email address from crawlers is by removing or replacing some characters. The most common method is to replace '@' character with [at]. It's fairly obvious to just about anyone what the correct address is and bots looking strictly for email addresses will get confused.

What is email obfuscation challenge?

Cloudflare Email Address Obfuscation helps in spam prevention by hiding email addresses appearing in your pages from email harvesters and other bots, while remaining visible to your site visitors.


2 Answers

This is personally my favorite method, which I have found to work so far, it's not bullet proof, in theory a bot that can parse CSS3 and will preform a text search can still find it or a spambot that triggered events in order to harvest email addresses would have to feed the page into basically a headless browser, somehow determine what might be JS-obfuscated email content these scenarios are enormous amount of work for possibly no benefit and no spammer would ever consider doing it, the fact is I have had no spam to date and it works great for humans, both to read or click on:

  <style>
    .email:after{ content:'@mydomain.com'; }
    </style>
    Contact me at:<div class="email">myemail</div>
    <script>
$('.email').click(function(){
window.location.href='mailto:'+$(this).html()+'@mydomain.com';
});
</script>

The thing is that the email is not a link so bots never trigger the click event as they don't even know it will do anything.

like image 190
Neo Avatar answered Sep 23 '22 03:09

Neo


Put the email address on a separate page which is only reachable by solving a CAPTCHA.

Granted, then the security is only as good as the security of the CAPTCHA.

Using your own obfuscations may be a serious alternative if you only have a limited number of addresses you want to protect. Some ideas I have used in the past;

  • Crossword puzzle. Make it really easy, with cues like famous song titles with one word missing (easy to google and no debate about possible second interpretations). You can fill in many letters to make it even easier.
  • Audio recording with background noise. I didn't want to use my own voice so I used a speech synthesizer, with a German accent (-: AT&T web demo IIRC) and mixed in a couple of seconds of music in the background (Frank Zappa's Peaches en regalia worked very well for me, but tastes differ).
  • Hand drawn image. I like to draw letter outlines but I doubt they are regular enough to pass any OCR.

The real beef here is not the stellar brilliance of these solutions, but the different approaches which I hope can inspire you to think in new directions. In the end, you will always be safer if you come up with your own unique solution; anything resembling a "new de facto standard" will be the lowest-hanging fruit that the scrapers will spend time trying to pluck.

Incidentally, I tried to think about usability for people with disabilities, so I actually deployed the audio version as a fallback for people who have issues with interacting with the other two, which are based on visual layout.

By the by, very few people want to send me email these days anyway (or maybe they do, but end up being rejected as spam?) which is frankly a relief. Those who do typically use the whois registration info for my domain name (which uses an anonymized address provided by the whois registrar) or are good guessers.

like image 35
tripleee Avatar answered Sep 23 '22 03:09

tripleee