Is there a way to exclude a Controller
Action from search engine crawling? Is there a MVC verb (Attribute), which can be added above the action name?
I want to exclude the following URL from search engine crawling
Home/Secret?type=1
But I want this to be available to search engine crawling
Home/Search
I think you need to dynamically generate a robots.txt file.
You should create a RobotController to serve a robots.txt file.
Check Reference Here
Related to the above link was a question about allowing .txt extension to be served by an action: https://stackoverflow.com/a/14084127/511438
public ActionResult Robots()
{
Response.ContentType = "text/plain";
//-- Here you should write a response with the list of
//areas/controllers/action for search engines not to follow.
return View();
}
Add a Robots.cshtml
Map a route so a call to the file will instead call the action above.
routes.MapRoute("Robots.txt",
"robots.txt",
new { controller = "Home", action = "Robots" });
Here is the NoRobots attribute with code to get a list of areas/controllers/actions that have the attribute. Sorry for interpreting the full namespace text. Would love for someone to look at the reflection to work things out better.
public sealed class NoRobotsAttribute : System.Attribute
{
public static IEnumerable<MethodInfo> GetActions()
{
return Assembly.GetExecutingAssembly().GetTypes()
.Where(t => (typeof(Controller).IsAssignableFrom(t)))
.SelectMany(
type =>
type.GetMethods(BindingFlags.Public | BindingFlags.Instance)
.Where(a => a.ReturnType == typeof(ActionResult))
);
}
public static IEnumerable<Type> GetControllers()
{
return Assembly.GetExecutingAssembly().GetTypes()
.Where(t => (typeof(Controller).IsAssignableFrom(t)));
}
public static List<string> GetNoRobots()
{
var robotList = new List<string>();
foreach (var methodInfo in GetControllers().Where(w => w.DeclaringType != null))
{
var robotAttributes = methodInfo
.GetCustomAttributes(typeof(NoRobotsAttribute), false)
.Cast<NoRobotsAttribute>();
foreach (var robotAttribute in robotAttributes)
{
//-- run through any custom attributes on the norobots attribute. None currently specified.
}
List<string> namespaceSplit = methodInfo.DeclaringType.FullName.Split('.').ToList();
var controllersIndex = namespaceSplit.IndexOf("Controllers");
var controller = (controllersIndex > -1 ? "/" + namespaceSplit[controllersIndex + 1] : "");
robotList.Add(controller);
}
foreach (var methodInfo in GetActions())
{
var robotAttributes = methodInfo
.GetCustomAttributes(typeof(NoRobotsAttribute), false)
.Cast<NoRobotsAttribute>();
foreach (var robotAttribute in robotAttributes)
{
//-- run through any custom attributes on the norobots attribute. None currently specified.
}
List<string> namespaceSplit = methodInfo.DeclaringType.FullName.Split('.').ToList();
var areaIndex = namespaceSplit.IndexOf("Areas");
var area = (areaIndex > -1 ? "/" + namespaceSplit[areaIndex + 1] : "");
var controllersIndex = namespaceSplit.IndexOf("Controllers");
var controller = (controllersIndex > -1 ? "/" + namespaceSplit[controllersIndex + 1] : "");
var action = "/" + methodInfo.Name;
robotList.Add(area + controller + action);
}
return robotList;
}
}
Usage:
[NoRobots] //Can be applied at controller or action method level.
public class HomeController : Controller
{
[NoRobots]
public ActionResult Index()
{
ViewData["Message"] = "Welcome to ASP.NET MVC!";
List<string> x = NoRobotsAttribute.GetNoRobots();
//-- Just some test code that wrote the result to a webpage.
return View(x);
}
}
... and for Areas.
namespace MVC.Temp.Areas.MyArea.Controllers
{
using MVC.Temp.Models.Home;
[NoRobots]
public class SubController : Controller
{
[NoRobots]
public ActionResult SomeAction()
{
return View();
}
}
}
So keep in mind that the solution relies on namespaces and would welcome any improvements someone can offer.
Finally, you need to write the robot file correctly, including any header information and wildcard support.
If it's publicly accessible, and especially linked on a page, a robot can/will find it. You can use rel="nofollow"
on links, use the noindex meta tag on the page itself, or use a robots.txt file to Disallow
indexing of the pages. This will prevent all the honest search engines (like Google, Bing, Yahoo) from indexing or following the links, but won't keep out the random bots from looking at the pages.
Never the less, the URL is accessible to the public. A human can visit it, then a computer can. If you would like to prevent it from being accessible to the general public you probably want to look into user authentication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With