Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the smartest way to handle robots.txt in Express?

I'm currently working on an application built with Express (Node.js) and I want to know what is the smartest way to handle different robots.txt for different environments (development, production).

This is what I have right now but I'm not convinced by the solution, I think it is dirty:

app.get '/robots.txt', (req, res) ->   res.set 'Content-Type', 'text/plain'   if app.settings.env == 'production'     res.send 'User-agent: *\nDisallow: /signin\nDisallow: /signup\nDisallow: /signout\nSitemap: /sitemap.xml'   else     res.send 'User-agent: *\nDisallow: /' 

(NB: it is CoffeeScript)

There should be a better way. How would you do it?

Thank you.

like image 208
Vinch Avatar asked Feb 27 '13 18:02

Vinch


People also ask

What should you block in a robots.txt file and what should you allow?

You may want to block urls in robots txt to keep Google from indexing private photos, expired special offers or other pages that you're not ready for users to access. Using it to block a URL can help with SEO efforts.

What is robot txt optimization?

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.


2 Answers

Use a middleware function. This way the robots.txt will be handled before any session, cookieParser, etc:

app.use('/robots.txt', function (req, res, next) {     res.type('text/plain')     res.send("User-agent: *\nDisallow: /"); }); 

With express 4 app.get now gets handled in the order it appears so you can just use that:

app.get('/robots.txt', function (req, res) {     res.type('text/plain');     res.send("User-agent: *\nDisallow: /"); }); 
like image 61
SystemParadox Avatar answered Oct 13 '22 09:10

SystemParadox


1. Create robots.txt with following content :

User-agent: * Disallow: # your rules here 

2. Add it to public/ directory.

3. If not already present in your code, add:

app.use(express.static('public')) 

Your robots.txt will be available to any crawler at http://yoursite.com/robots.txt

like image 36
atul Avatar answered Oct 13 '22 09:10

atul