I'm pretty sure that many people have thought of this, but for some reason I can't find it using Google and StackOverflow search.
I would like to make an invisible link (blacklisted by robots.txt) to a CGI or PHP page that will "trap" malicious bots and spiders. So far, I've tried:
Empty links in the body:
<a href='/trap'><!-- nothing --></a>
This works quite nicely most of the time, with two minor problems:
Problem: The link is part of the body of the document. Even though it is pretty much unclickable with a mouse, some visitors still inadvertently hit it while keyboard-navigating the site with Tab and Enter. Also, if they copy-paste the page into a word processor or e-mail software, for example, the trap link is copied along and sometimes even clickable (some software don't like empty <a>
tags and copy the href as the contents of the tag).
Invisible blocks in the body:
<div style="display:none"><a href='/trap'><!-- nothing --></a></div>
This fixes the problem with keyboard navigation, at least in the browsers I tested. The link is effectively inaccessible from the normal display of the page, while still fully visible to most spider bots with their current level of intelligence.
Problem: The link is still part of the DOM. If the user copy-paste the contents of the page, it reappears.
Inside comment blocks:
<!-- <a href='/trap'>trap</a> -->
This effectively removes the link from the DOM of the page. Well, technically, the comment is still part of the DOM, but it achieves the desired effect that compliant user-agents won't generate the A element, so it is not an actual link.
Problem: Most spider bots nowadays are smart enough to parse (X)HTML and ignore comments. I've personally seen bots that use Internet Explorer COM/ActiveX objects to parse the (X)HTML and extract all links through XPath or Javascript. These types of bots are not tricked into following the trap hyperlink.
I was using method #3 until last night, when I was hit by a swarm of bots that seem to be really selective on which links they follow. Now I'm back to method #2, but I'm still looking for a more effective way.
Any suggestions, or another different solution that I missed?
Use CSS styling to make your links invisible The first way is by using none as the pointer-events CSS property value. The other is by simply coloring the text to match the background of the page. Neither method hides the link if someone inspects the HTML source code.
You can do so anywhere in the <body></body> tag to make the link not have an underline. Defining a style property this way is called inline styling. The style is specified "inline," in the element itself, in the body of your page.
Add it like you said:
<a id="trap" href='/trap'><!-- nothing --></a>
And then remove it with javascript/jQuery:
$('#trap').remove();
Spam bots won't execute the javascript and see the element, almost any browser will remove the element making it impossible to hit with tabbing to it
Edit: The easiest non-jQuery way would be:
<div id="trapParent"><a id="trap" href='/trap'><!-- nothing --></a></div>
And then remove it with javascript:
var parent = document.getElementById('trapParent');
var child = document.getElementById('trap');
parent.removeChild(child);
this solution seems to work well for me, luckily i have bookmarked it. I hope it helps you as well.
you can create a hidden link like this and put it at the very top left of your page and to prevent regular users from accessing it too easily you can use css to lay a logo image over this image.
<a href="/bottrap.php"><img src="images/pixel.gif" border="0" alt=" " width="1" height="1"></a>
if you are interested in setting up how to blacklist the bots refer to this link for detailed explaination of howto.
http://www.webmasterworld.com/apache/3202976.htm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With