I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.
I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:
/MSIE (\d+\.\d+);/.test(navigator.userAgent); //test for MSIE x.x
Example of search crawlers I want to block:
Google Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Googlebot/2.1 (+http://www.googlebot.com/bot.html) Googlebot/2.1 (+http://www.google.com/bot.html) Baidu Baiduspider+(+http://www.baidu.com/search/spider_jp.html) Baiduspider+(+http://www.baidu.com/search/spider.htm) BaiDuSpider
There are two methods for verifying Google's crawlers: Manually: For one-off lookups, use command line tools. This method is sufficient for most use cases. Automatically: For large scale lookups, use an automatic solution to match a crawler's IP address against the list of published Googlebot IP addresses.
As early as 2008, Google was successfully crawling JavaScript, but probably in a limited fashion. Today, it's clear that Google has not only evolved what types of JavaScript they crawl and index, but they've made significant strides in rendering complete web pages (especially in the last 12-18 months).
Googlebot processes JavaScript web apps in three main phases: Crawling. Rendering.
This is the regex the ruby UA agent_orange
library uses to test if a userAgent
looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:
/bot|crawler|spider|crawling/i
For example you have some object, util.browser
, you can store what type of device a user is on:
util.browser = { bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent), mobile: ..., desktop: ... }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With