Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use robot.txt on javascript files?

Tags:

robots.txt

Is there any reason you should or shouldn't allow access to javascript or css files? Specifically common files such as jquery.

like image 704
Ray Avatar asked Jan 16 '23 01:01

Ray


1 Answers

It's widely accepted that search engines allocate a certain amount of bandwidth or # of URLs to a given site per day. So some webmasters like to block JS, CSS, and boilerplate images from the search engines to conserve their bandwidth so Google or Bing will crawl more pages instead of unnecessary images.

Googler, Matt Cutts, has asked in the past that webmasters don't do this (http://www.seroundtable.com/googlebot-javascript-css-14930.html).

It appears that Google would like to know exactly how your site behaves, with and without javascript. There's plenty of evidence that they're rendering out the entire page, as well as, executing other javascript that is executed onPageLoad (e.g. Facebook comments).

If you block even common jQuery files, Google really doesn't know if it's a common jQuery implementation or if you've modified the core files, hence modifying the experience.

My suggestion would be to make sure all your JS, CSS, and boilerplate images are served off a separate domain or CNAME. I would monitor Googlebot's crawl through logs and Google Webmaster Tools, and observe whether or not they're spending a lot of time and bandwidth to crawl these assets. If not, then just let them keep crawling it.

As each site behaves differently, you could experiment and block some of the more heavily requested files that are sucking down a large amount of bandwidth ... and then observe to see if Google's "pages crawled" increases.

like image 134
eywu Avatar answered Jan 28 '23 00:01

eywu