Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

robots.txt in subdirectory

I have a project that lies in a folder below the main domain, and I dont have access to the root of the domain itself.

http://mydomain.com/myproject/

I want to disallow indexing on the subfolder "forbidden"

http://mydomain.com/myproject/forbidden/

Can I simply put a robots.txt in the myproject folder? Will it get read even if there is no robots.txt in the root?

What is the correct syntax for disallowing the forbidden folder?

User-agent: *
Disallow: /forbidden/

or

User-agent: *
Disallow: forbidden/
like image 531
magnattic Avatar asked Jan 29 '11 14:01

magnattic


People also ask

Can robots.txt be in a subfolder?

txt file must be located at https://www.example.com/robots.txt . It cannot be placed in a subdirectory (for example, at https://example.com/pages/robots.txt ). If you're unsure about how to access your website root, or need permissions to do so, contact your web hosting service provider.

Which directory contains the robot txt file?

A robots. txt file is a text document that's located in the root directory of a site that contains information intended for search engine crawlers about which URLs—that house pages, files, folders, etc.

How do you write a subdirectory?

Subdirectory URL In a URL, the subdirectory comes after the root directory or domain name. For example, HubSpot's root domain is hubspot.com. So a subdirectory URL might be hubspot.com/pricing.

What is the difference between directory and subdirectory?

Files are organized by storing related files in the same directory. In a hierarchical file system (that is, one in which files and directories are organized in a manner that resembles a tree), a directory contained inside another directory is called a subdirectory.


2 Answers

From robotstxt.org:

Where to put it

The short answer: in the top-level directory of your web server.

The longer answer:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

So I'm afraid the answer is that you have to put it in the root folder :-(

With regards to your second question, I believe the correct syntax is the one starting with a forward slash (eg. /forbidden/).

like image 124
Klaus Byskov Pedersen Avatar answered Oct 08 '22 02:10

Klaus Byskov Pedersen


You can't unfortunately. Robots.txt can only go at the root of the domain.

Maybe if you ask the owner of the domain kindly he will oblige?

The first syntax is the correct syntax, but remember it needs to be the absolute path from the root of the domain.

like image 44
Alec Gorge Avatar answered Oct 08 '22 04:10

Alec Gorge