Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude bots and spiders from a View counter in PHP

I have built a pretty basic advertisement manager for a website in PHP.

I say basic because it's not complex like Google or Facebook ads or even most high end ad servers. Doesn't handle payments or anything or even targeting users.

It serves the purpose for my low traffic site though to simply show a random banner ad, count impression views and clicks.

Features:

  • Ad slot/position on page
  • Banner image
  • Name
  • View/impression counter
  • Click counter
  • Start and end date, or never ending
  • Disable/enable ad

I am wanting to gradually add more functionality to the system though.

One thing I have noticed is the Impressions/views counter often seems inflated.

I believe the cause of this is from Social networks' spiders and bots as well as search engine spiders.

For example, if someone enters a URL from a page on my website into Facebook, Google+, Twitter, LinkedIn, Pinterest, and other networks, those sites will often spider my site to gather the webpages Title, images, and description.

I would really like to be able to disable this from counting as Advertisement impressions/view counts when an actual human is not viewing the page.

I realize this will be very hard to detect all these but if there is a way to get a majority of them, at least it will make my stats a little more accurate.

So I am reaching out for any help or ideas on how to achieve my goal? Please do not say to use another advertisement system, that is not in the cards, thank you

enter image description here

like image 650
JasonDavis Avatar asked Jul 07 '13 19:07

JasonDavis


1 Answers

You need to serve the ADs with JavaScript. That's the only way to avoid most of the crawlers. Only browsers load dependencies like Images, JS and CSS. 99% of the robots avoid them.

You can also do this:

// basic crawler detection and block script (no legit browser should match this)
if(!empty($_SERVER['HTTP_USER_AGENT']) and preg_match('~(bot|crawl)~i', $_SERVER['HTTP_USER_AGENT'])){
    // this is a crawler and you should not show ads here
}

You'll have much better stats this way. Use JS for ads.

PS: You could also try setting a cookie in JS and later checking for it. Crawlers might get cookies sent in PHP by HTTP but those set in JS, 99.9% chances they'll miss it. Because they need to load a JS file and interpret it. That's only done by browsers.

like image 69
CodeAngry Avatar answered Oct 27 '22 15:10

CodeAngry