I'm trying to render a dynamic text file (robots.txt) in my Rails (3.0.10) app, but it continues to render it as HTML (says the console).
match 'robots.txt' => 'sites#robots'
Controller:
class SitesController < ApplicationController
respond_to :html, :js, :xml, :css, :txt
def robots
@site = Site.find_by_subdomain # blah blah
end
end
app/views/sites/robots.txt.erb:
Sitemap: <%= @site.url %>/sitemap.xml
But when I visit http://www.example.com/robots.txt
I get a blank page/source, and the log says:
Started GET "/robots.txt" for 127.0.0.1 at 2011-11-21 11:22:13 -0500
Processing by SitesController#robots as HTML
Site Load (0.4ms) SELECT `sites`.* FROM `sites` WHERE (`sites`.`subdomain` = 'blah') ORDER BY created_at DESC LIMIT 1
Completed 406 Not Acceptable in 828ms
Any idea what I'm doing wrong?
Note: I added this to config/initializers/mime_types, cause Rails was complaining about not knowing what the .txt mime type was:
Mime::Type.register_alias "text/plain", :txt
Note 2: I did remove the stock robots.txt from the public directory.
NOTE: This is a repost from coderwall.
Reading up on some advice to a similar answer on Stackoverflow, I currently use the following solution to render a dynamic robots.txt based on the request's host parameter.
# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
# No layout
layout false
# Render a robots.txt file based on whether the request
# is performed against a canonical url or not
# Prevent robots from indexing content served via a CDN twice
def index
if canonical_host?
render 'allow'
else
render 'disallow'
end
end
private
def canonical_host?
request.host =~ /plugingeek\.com/
end
end
Based on the request.host
we render one of two different .text.erb
view files.
Allowing robots
# app/views/robots/allow.text.erb # Note the .text extension
# Allow robots to index the entire site except some specified routes
# rendered when site is visited with the default hostname
# http://www.robotstxt.org/
# ALLOW ROBOTS
User-agent: *
Disallow:
Banning spiders
# app/views/robots/disallow.text.erb # Note the .text extension
# Disallow robots to index any page on the site
# rendered when robot is visiting the site
# via the Cloudfront CDN URL
# to prevent duplicate indexing
# and search results referencing the Cloudfront URL
# DISALLOW ROBOTS
User-agent: *
Disallow: /
Testing the setup with RSpec and Capybara can be done quite easily, too.
# spec/features/robots_spec.rb
require 'spec_helper'
feature "Robots" do
context "canonical host" do
scenario "allow robots to index the site" do
Capybara.app_host = 'http://www.plugingeek.com'
visit '/robots.txt'
Capybara.app_host = nil
expect(page).to have_content('# ALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow:')
expect(page).to have_no_content('Disallow: /')
end
end
context "non-canonical host" do
scenario "deny robots to index the site" do
visit '/robots.txt'
expect(page).to have_content('# DISALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow: /')
end
end
end
# This would be the resulting docs
# Robots
# canonical host
# allow robots to index the site
# non-canonical host
# deny robots to index the site
As a last step, you might need to remove the static public/robots.txt
in the public folder if it's still present.
I hope you find this useful. Feel free to comment, helping to improve this technique even further.
One solution that works in Rails 3.2.3 (not sure about 3.0.10) is as follows:
1) Name your template file robots.text.erb
# Emphasis on text
vs. txt
2) Setup your route like this: match '/robots.:format' => 'sites#robots'
3) Leave your action as is (you can remove the respond_with in the controller)
def robots
@site = Site.find_by_subdomain # blah blah
end
This solution also eliminates the need to explicitly specify txt.erb
in the render
call mentioned in the accepted answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With