Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robots.txt file in MVC.NET 4

I have read an article about ignoring the robots from some url in my ASP MVC.NET project. In his article author said that we should add some action in some off controllers like this. In this example he adds the action to the Home Controller:

#region -- Robots() Method --
public ActionResult Robots()
{
    Response.ContentType = "text/plain";
    return View();
}
#endregion

then we should add a Robots.cshtml file in our project with this body

@{
    Layout = null;
}
# robots.txt for @this.Request.Url.Host

User-agent: *
Disallow: /Administration/
Disallow: /Account/

and finally we should add this line of code to the Gloabal.asax

routes.MapRoute("Robots.txt",
                "robots.txt",
                new { controller = "Home", action = "Robots" });

my question is that do robots crawl the controllers which has [Authorization] attribute like Administration?

like image 976
Behzad Hassani Avatar asked Jun 01 '15 16:06

Behzad Hassani


2 Answers

This simple piece of code worked for my asp net core 3.1 site:

    [Route("/robots.txt")]
    public ContentResult RobotsTxt()
    {
        var sb = new StringBuilder();
        sb.AppendLine("User-agent: *")
            .AppendLine("Disallow:")
            .Append("sitemap: ")
            .Append(this.Request.Scheme)
            .Append("://")
            .Append(this.Request.Host)
            .AppendLine("/sitemap.xml");

        return this.Content(sb.ToString(), "text/plain", Encoding.UTF8);
    }
like image 150
Panayiotis Hiripis Avatar answered Sep 20 '22 10:09

Panayiotis Hiripis


do robots crawl the controllers which has [Authorization] attribute like Administration

If they find a link to it, they are likely to try and crawl it, but they will fail just like anyone with a web browser that does not log in. Robots have no special ability to access your website differently than a standard browser.

Note that robots that conform to the Robots Exclusion Standard crawl the exact URL

http://mydomain/robots.txt

You can create a response for that URL however you like. One approach is certainly to have a controller that handles that request. You can also just add a text file with the same content you would have returned from the controller, e.g.

User-agent: *
Disallow: /Administration/
Disallow: /Account/

to the root folder of your project and make sure it is marked as content so that it is deployed to the website.

Adding this robots.txt entry will prevent conforming robots from attempting to browse controllers that require authentication (and lighten the load on your website slightly), but without the robots file they will just try the URL and fail.

like image 28
Eric J. Avatar answered Sep 23 '22 10:09

Eric J.