Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Millions of anonymous ASP.Net profiles?

UPDATE: I've just realised that we are using Google Mini Search to crawl the website in order for us to support Google Search. This is bound to be creating an anonymous profile for not only each crawl but maybe even each page - would that be possible?

Hi all, some advice needed!

Our website receives approximately 50,000 hits a day, and we use anonymous ASP.Net membership profiles/users, this is resulting in millions (4.5m currently) of "active" profiles and the database is 'crawling', we have a nightly task that cleans up all the inactive ones.

There is no way that we have 4.5m unique visitors (our county population is only 1/2 million), could this be caused by crawlers and spiders?

Also, if we have to live with this huge number of profiles is there anyway of optimising the DB?

Thanks

Kev

like image 593
Mantorok Avatar asked May 04 '10 10:05

Mantorok


2 Answers

Update following conversation:

Might I suggest that you implement a filter that can identify crawlers via request headers, and logging the anon cookie which you can later that same day. decrypt and delete the anon aspnet_profile and aspnet_users record with the associated UserId.

You might be fighting a losing battle but at least you will get a clear idea of where all the traffic is coming from.


AnonymousId cookies and, by proxy, anonymous profiles are valid for 90 days after last use. This can result in the anon profiles piling up.

A very simple way to handle this is to use ProfileManager.

ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));

will clear out all the anonymous profiles that have not been accessed in the last 7 days.

But that leaves you with the anonymous records in aspnet_Users. Membership does not expose a method similar to ProfileManager for deleting stale anonymous users.

So...

The best bet is a raw sql attack, deleting from aspnet_Profile where you consider them stale, and then run the same query on aspnet_User where IsAnonymous = 1.

Good luck with that. Once you get it cleaned up, just stay on top of it.


Updated Update:

The code below is only valid on IIS7 AND if you channel all requests through ASP.Net

You could implement a module that watches for requests to robots.txt and get the anonymous id cookie and stash it in a robots table which you can use to safely purge your membership/profile tables of robot meta every night. This might help.

Example:

using System;
using System.Diagnostics;
using System.Web;

namespace NoDomoArigatoMisterRoboto
{
    public class RobotLoggerModule : IHttpModule
    {
        #region IHttpModule Members

        public void Init(HttpApplication context)
        {
            context.PreSendRequestHeaders += PreSendRequestHeaders;
        }

        public void Dispose()
        {
            //noop
        }

        #endregion

        private static void PreSendRequestHeaders(object sender, EventArgs e)
        {
            HttpRequest request = ((HttpApplication)sender).Request;

            

            bool isRobot = 
                request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);

            string anonymousId = request.AnonymousID;

            if (anonymousId != null && isRobot)
            {
                // log this id for pruning later
                Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
            }
        }
    }
}

Reference: http://www.codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx


like image 117
Sky Sanders Avatar answered Nov 09 '22 06:11

Sky Sanders


You could try deleting anonymous profiles in the Session_End event in your Global.asax.cs file.

There is every likelyhood that your site is being crawled, either by a legitimate search engine crawler and/or by an illegal crawler looking for vulnerabilities that would allow hackers to take control of your site/server. You should look into this, regardless of which solution you take for removing old profiles.

If you are using the default Profile Provider, which keeps all of the profile information in a single column, you might want to read this link which is to Scott Guthrie's article on a better performing table-based profile provider.

like image 1
Daniel Dyson Avatar answered Nov 09 '22 04:11

Daniel Dyson