How to set up a robot.txt which only allows the default page of a site

Question

Say I have a site on http://example.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words

http://example.com & http://example.com/ should be allowed, but http://example.com/anything and http://example.com/someendpoint.aspx should be blocked.

Further it would be great if I can allow certain query strings to passthrough to the home page: http://example.com?okparam=true

but not http://example.com?anythingbutokparam=true

Boaz · Accepted Answer

So after some research, here is what I found - a solution acceptable by the major search providers: google , yahoo & msn (I could on find a validator here) :

User-Agent: * Disallow: /* Allow: /?okparam= Allow: /$

The trick is using the $ to mark the end of URL.

ceejayoz · Answer

Google's Webmaster Tools report that disallow always takes precedence over allow, so there's no easy way of doing this in a robots.txt file.

You could accomplish this by puting a noindex,nofollow META tag in the HTML every page but the home page.

How to set up a robot.txt which only allows the default page of a site

Tags:

Boaz

2 Answers

Boaz

ceejayoz

Recent Activity

Donate For Us

How to set up a robot.txt which only allows the default page of a site

Tags:

Boaz

2 Answers

Boaz

ceejayoz

Related questions

Recent Activity

Donate For Us