We want to add page views counters to our articles pages (just like in Stackoverflow), but we don't want to add page views of bots and crawlers.
I searched quite a bit, and only found very obsolete answers which say to fire an AJAX request, since crawlers and bots don't execute javascript... Well, it's 2016... I believe all the major crawlers execute javascript nowadays.
I thought about two viable solutions:
robots.txt
. (or a hidden image with a src="/article/track/?id=xxxxx"
) The second option creates another request per page, not horrible, but maybe there's a better way? What is the common way of handling this today?
Using ASP.NET Core and storing the page views in redis if it matters
I found out how Stackoverflow themselves handle it:
<script>
StackExchange.ready(function(){$.get('/posts/40008735/ivc/e079');});
</script>
<noscript>
<div>
<img src="/posts/40008735/ivc/e079" class="dno" alt="" width="0" height="0">
</div>
</noscript>
And in robots.txt:
Disallow: /*/ivc/*
...
User-agent: Googlebot-Image
Disallow: /*/ivc/*
So basically, they handle it as I suggested in option 2:
Issue an AJAX request (or with a hidden img in case javascript is disabled) and instruct crawlers and bots to not crawl that URL with Disallow
.
As I mentioned on chat, you could cache the IP address of the client when it requests /robots.txt
.
On other requests, check if the IP address is in the cache and don't count it as a page view if it is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With