I'm currently working on an application built with Express (Node.js) and I want to know what is the smartest way to handle different robots.txt for different environments (development, production).
This is what I have right now but I'm not convinced by the solution, I think it is dirty:
app.get '/robots.txt', (req, res) -> res.set 'Content-Type', 'text/plain' if app.settings.env == 'production' res.send 'User-agent: *\nDisallow: /signin\nDisallow: /signup\nDisallow: /signout\nSitemap: /sitemap.xml' else res.send 'User-agent: *\nDisallow: /'
(NB: it is CoffeeScript)
There should be a better way. How would you do it?
Thank you.
You may want to block urls in robots txt to keep Google from indexing private photos, expired special offers or other pages that you're not ready for users to access. Using it to block a URL can help with SEO efforts.
A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.
Use a middleware function. This way the robots.txt will be handled before any session, cookieParser, etc:
app.use('/robots.txt', function (req, res, next) { res.type('text/plain') res.send("User-agent: *\nDisallow: /"); });
With express 4 app.get
now gets handled in the order it appears so you can just use that:
app.get('/robots.txt', function (req, res) { res.type('text/plain'); res.send("User-agent: *\nDisallow: /"); });
robots.txt
with following content :User-agent: * Disallow: # your rules here
public/
directory.app.use(express.static('public'))
Your robots.txt
will be available to any crawler at http://yoursite.com/robots.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With