UPDATE: I've just realised that we are using Google Mini Search to crawl the website in order for us to support Google Search. This is bound to be creating an anonymous profile for not only each crawl but maybe even each page - would that be possible?
Hi all, some advice needed!
Our website receives approximately 50,000 hits a day, and we use anonymous ASP.Net membership profiles/users, this is resulting in millions (4.5m currently) of "active" profiles and the database is 'crawling', we have a nightly task that cleans up all the inactive ones.
There is no way that we have 4.5m unique visitors (our county population is only 1/2 million), could this be caused by crawlers and spiders?
Also, if we have to live with this huge number of profiles is there anyway of optimising the DB?
Thanks
Kev
Update following conversation:
Might I suggest that you implement a filter that can identify crawlers via request headers, and logging the anon cookie which you can later that same day. decrypt and delete the anon aspnet_profile and aspnet_users record with the associated UserId.
You might be fighting a losing battle but at least you will get a clear idea of where all the traffic is coming from.
AnonymousId cookies and, by proxy, anonymous profiles are valid for 90 days after last use. This can result in the anon profiles piling up.
A very simple way to handle this is to use ProfileManager
.
ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));
will clear out all the anonymous profiles that have not been accessed in the last 7 days.
But that leaves you with the anonymous records in aspnet_Users. Membership
does not expose a method similar to ProfileManager
for deleting stale anonymous users.
So...
The best bet is a raw sql attack, deleting from aspnet_Profile where you consider them stale, and then run the same query on aspnet_User where IsAnonymous = 1
.
Good luck with that. Once you get it cleaned up, just stay on top of it.
Updated Update:
The code below is only valid on IIS7 AND if you channel all requests through ASP.Net
You could implement a module that watches for requests to robots.txt
and get the anonymous id cookie and stash it in a robots table which you can use to safely purge your membership/profile tables of robot meta every night. This might help.
Example:
using System;
using System.Diagnostics;
using System.Web;
namespace NoDomoArigatoMisterRoboto
{
public class RobotLoggerModule : IHttpModule
{
#region IHttpModule Members
public void Init(HttpApplication context)
{
context.PreSendRequestHeaders += PreSendRequestHeaders;
}
public void Dispose()
{
//noop
}
#endregion
private static void PreSendRequestHeaders(object sender, EventArgs e)
{
HttpRequest request = ((HttpApplication)sender).Request;
bool isRobot =
request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);
string anonymousId = request.AnonymousID;
if (anonymousId != null && isRobot)
{
// log this id for pruning later
Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
}
}
}
}
Reference: http://www.codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx
You could try deleting anonymous profiles in the Session_End event in your Global.asax.cs file.
There is every likelyhood that your site is being crawled, either by a legitimate search engine crawler and/or by an illegal crawler looking for vulnerabilities that would allow hackers to take control of your site/server. You should look into this, regardless of which solution you take for removing old profiles.
If you are using the default Profile Provider, which keeps all of the profile information in a single column, you might want to read this link which is to Scott Guthrie's article on a better performing table-based profile provider.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With