Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect Search Crawlers via JavaScript

I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.

I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:

/MSIE (\d+\.\d+);/.test(navigator.userAgent); //test for MSIE x.x

Example of search crawlers I want to block:

Google  Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)  Googlebot/2.1 (+http://www.googlebot.com/bot.html)  Googlebot/2.1 (+http://www.google.com/bot.html)   Baidu  Baiduspider+(+http://www.baidu.com/search/spider_jp.html)  Baiduspider+(+http://www.baidu.com/search/spider.htm)  BaiDuSpider  
like image 674
Jon Avatar asked Nov 19 '13 23:11

Jon


People also ask

How do you identify a crawler?

There are two methods for verifying Google's crawlers: Manually: For one-off lookups, use command line tools. This method is sufficient for most use cases. Automatically: For large scale lookups, use an automatic solution to match a crawler's IP address against the list of published Googlebot IP addresses.

Can search engines crawl JavaScript?

As early as 2008, Google was successfully crawling JavaScript, but probably in a limited fashion. Today, it's clear that Google has not only evolved what types of JavaScript they crawl and index, but they've made significant strides in rendering complete web pages (especially in the last 12-18 months).

Does Googlebot use JavaScript?

Googlebot processes JavaScript web apps in three main phases: Crawling. Rendering.


1 Answers

This is the regex the ruby UA agent_orange library uses to test if a userAgent looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:

/bot|crawler|spider|crawling/i 

For example you have some object, util.browser, you can store what type of device a user is on:

util.browser = {    bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent),    mobile: ...,    desktop: ... } 
like image 60
megawac Avatar answered Sep 24 '22 07:09

megawac