Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to do about robots.txt for my CakePHP app?

I have been getting the following on my live CakePHP web app:

2014-02-18 04:06:00 Error: [MissingControllerException] Controller class Robots.txtController could not be found.
Exception Attributes: array (
  'class' => 'Robots.txtController',
  'plugin' => NULL,
)
Request URL: /robots.txt

What should I do?

I am using CakePHP 2.4.2

UPDATE:

This is my robots.txt. Anything else I should add? I placed it in webroot.

User-agent: *
Disallow: /admin/
like image 481
Kim Stacks Avatar asked Feb 28 '14 10:02

Kim Stacks


People also ask

How do I bypass robots.txt disallow?

If you don't want your crawler to respect robots. txt then just write it so it doesn't. You might be using a library that respects robots. txt automatically, if so then you will have to disable that (which will usually be an option you pass to the library when you call it).

What happens if you don't use a robots.txt file?

Warning: Don't use a robots.txt file as a means to hide your web pages from Google search results. If other pages point to your page with descriptive text, Google could still index the URL without visiting the page.

Where do I put my robots.txt file?

The robots.txt file must be located at the root of the website host to which it applies. For instance, to control crawling on all URLs below https://www.example.com/ , the robots.txt file must be located at https://www.example.com/robots.txt .


2 Answers

Copy the robots.txt into the /app/webroot/ directory.

like image 200
Harald Ernst Avatar answered Oct 21 '22 12:10

Harald Ernst


The reason you were getting the error message was because a Bot or other software was requesting the file and CakePHP couldn't find it, because it didn't exist. Now that you have created a robots.txt you should not receive error message. You can check this yourself, by going to:

http://www.example.com/robots.txt

I would probably remove /admin/, don't want to advertise where your backend is!

A simple text like the following in your robots.txt file should be sufficient, remove the reference to sitemap if you don't have one:

User-agent: *
Disallow:

Sitemap: http://www.example.com/sitemap.xml

Hope you find this helpful.

like image 4
Progredi Digital Avatar answered Oct 21 '22 14:10

Progredi Digital