Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different robots.txt for staging server on Heroku

I have staging and production apps on Heroku.

For crawler, I set robots.txt file.

After that I got message from Google.

Dear Webmaster, The host name of your site, https://www.myapp.com/, does not match any of the "Subject Names" in your SSL certificate, which were:
*.herokuapp.com
herokuapp.com

The Google bot read the robots.txt on my staging apps and send this message. because I didn't set anything for preventing crawlers to read the file.

So, what I'm thinking about is to change .gitignore file between staging and production, but I can't figure out how to do this.

What are the best practices for implementing this?

EDIT

I googled about this and found this article http://goo.gl/2ZHal

This article says to set basic Rack authentication and you won't need to care about robots.txt.

I didn't know that basic auth can prevent google bot. It seems this solution is better that manipulate .gitignore file.

like image 400
Atsuhiro Teshima Avatar asked Aug 05 '12 02:08

Atsuhiro Teshima


Video Answer


1 Answers

A great solution with Rails 3 is to use Rack. Here is a great post that outlines the process: Serving Different Robots.txt Using Rack. To summarize, you add this to your routes.rb:

 # config/routes.rb
 require 'robots_generator' # Rails 3 does not autoload files in lib 
 match "/robots.txt" => RobotsGenerator

and then create a new file inside lib/robots_generator.rb

# lib/robots_generator.rb
class RobotsGenerator
  # Use the config/robots.txt in production.
  # Disallow everything for all other environments.
  # http://avandamiri.com/2011/10/11/serving-different-robots-using-rack.html
  def self.call(env)
    body = if Rails.env.production?
      File.read Rails.root.join('config', 'robots.txt')
    else
      "User-agent: *\nDisallow: /"
    end

    # Heroku can cache content for free using Varnish.
    headers = { 'Cache-Control' => "public, max-age=#{1.month.seconds.to_i}" }

    [200, headers, [body]]
  rescue Errno::ENOENT
    [404, {}, ['# A robots.txt is not configured']]
  end
end

Finally make sure to include move robots.txt into your config folder (or wherever you specify in your RobotsGenerator class).

like image 97
stereoscott Avatar answered Oct 30 '22 09:10

stereoscott