Today whilst improving my web crawler to support the robots.txt standard, I came across the following code at http://www.w3schools.com/robots.txt
User-agent: Mediapartners-Google
Disallow:
Is this syntax correct? Shouldn't it be Disallow: /
or Allow: /
depending on the intended purpose?
An empty Disallow line means you're not disallowing anything so that a spider can access all sections of your site. The example below would block all search engines that “listen” to robots. txt from crawling your site.
Luckily, there's a simple fix for this error. All you have to do is update your robots. txt file (example.com/robots.txt) and allow Googlebot (and others) to crawl your pages. You can test these changes using the Robots.
A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Disallow:
Will allow everything, as will:
Allow: /
You're either disallowing nothing, or allowing everything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With