Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

asp.net mvc exclude an action from search engine crawling

Is there a way to exclude a Controller Action from search engine crawling? Is there a MVC verb (Attribute), which can be added above the action name?

I want to exclude the following URL from search engine crawling

Home/Secret?type=1

But I want this to be available to search engine crawling

Home/Search
like image 278
pili Avatar asked Aug 11 '13 21:08

pili


2 Answers

I think you need to dynamically generate a robots.txt file.

You should create a RobotController to serve a robots.txt file.

Check Reference Here

Related to the above link was a question about allowing .txt extension to be served by an action: https://stackoverflow.com/a/14084127/511438

public ActionResult Robots()
{
    Response.ContentType = "text/plain";
    //-- Here you should write a response with the list of 
    //areas/controllers/action for search engines not to follow.
    return View();
}

Add a Robots.cshtml

Map a route so a call to the file will instead call the action above.

routes.MapRoute("Robots.txt",
                "robots.txt",
                new { controller = "Home", action = "Robots" });

Here is the NoRobots attribute with code to get a list of areas/controllers/actions that have the attribute. Sorry for interpreting the full namespace text. Would love for someone to look at the reflection to work things out better.

public sealed class NoRobotsAttribute : System.Attribute
{

    public static IEnumerable<MethodInfo> GetActions()
    {
        return Assembly.GetExecutingAssembly().GetTypes()
               .Where(t => (typeof(Controller).IsAssignableFrom(t)))
               .SelectMany(
                    type =>
                    type.GetMethods(BindingFlags.Public | BindingFlags.Instance)
                        .Where(a => a.ReturnType == typeof(ActionResult))
                 );

    }
    public static IEnumerable<Type> GetControllers()
    {
        return Assembly.GetExecutingAssembly().GetTypes()
               .Where(t => (typeof(Controller).IsAssignableFrom(t)));

    }


    public static List<string> GetNoRobots()
    {
        var robotList = new List<string>();

        foreach (var methodInfo in GetControllers().Where(w => w.DeclaringType != null))
        {
            var robotAttributes = methodInfo
                    .GetCustomAttributes(typeof(NoRobotsAttribute), false)
                    .Cast<NoRobotsAttribute>();

            foreach (var robotAttribute in robotAttributes)
            {
                 //-- run through any custom attributes on the norobots attribute. None currently specified.
            }
            List<string> namespaceSplit = methodInfo.DeclaringType.FullName.Split('.').ToList();

            var controllersIndex = namespaceSplit.IndexOf("Controllers");
            var controller = (controllersIndex > -1 ? "/" + namespaceSplit[controllersIndex + 1] : "");
            robotList.Add(controller);

        }

        foreach (var methodInfo in GetActions())
        {
            var robotAttributes = methodInfo
                    .GetCustomAttributes(typeof(NoRobotsAttribute), false)
                    .Cast<NoRobotsAttribute>();

            foreach (var robotAttribute in robotAttributes)
            {
                 //-- run through any custom attributes on the norobots attribute. None currently specified.
            }

            List<string> namespaceSplit = methodInfo.DeclaringType.FullName.Split('.').ToList();

            var areaIndex = namespaceSplit.IndexOf("Areas");
            var area = (areaIndex > -1 ? "/" + namespaceSplit[areaIndex + 1] : "");

            var controllersIndex = namespaceSplit.IndexOf("Controllers");
            var controller = (controllersIndex > -1 ? "/" + namespaceSplit[controllersIndex + 1] : "");

            var action = "/" + methodInfo.Name;

            robotList.Add(area + controller + action);

        }
        return robotList;
    }
}

Usage:

[NoRobots] //Can be applied at controller or action method level.
public class HomeController : Controller
{
    [NoRobots]
    public ActionResult Index()
    {
        ViewData["Message"] = "Welcome to ASP.NET MVC!";
        List<string> x = NoRobotsAttribute.GetNoRobots();
        //-- Just some test code that wrote the result to a webpage.
        return View(x);
    }
}

... and for Areas.

namespace MVC.Temp.Areas.MyArea.Controllers
{
    using MVC.Temp.Models.Home;

    [NoRobots]
    public class SubController : Controller
    {
        [NoRobots]
        public ActionResult SomeAction()
        {
            return View();
        }

    }
}

So keep in mind that the solution relies on namespaces and would welcome any improvements someone can offer.

Finally, you need to write the robot file correctly, including any header information and wildcard support.

like image 80
Valamas Avatar answered Oct 15 '22 17:10

Valamas


If it's publicly accessible, and especially linked on a page, a robot can/will find it. You can use rel="nofollow" on links, use the noindex meta tag on the page itself, or use a robots.txt file to Disallow indexing of the pages. This will prevent all the honest search engines (like Google, Bing, Yahoo) from indexing or following the links, but won't keep out the random bots from looking at the pages.

Never the less, the URL is accessible to the public. A human can visit it, then a computer can. If you would like to prevent it from being accessible to the general public you probably want to look into user authentication.

like image 2
Steven V Avatar answered Oct 15 '22 18:10

Steven V