Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to recommend the next achievement [closed]

Short version:

I have a similar setup to StackOverflow. Users get Achievements. I have many more achievements than SO, lets say on the order of 10k, and each user has in the 100s of achievements. Now, how would you recommend (to recommend) the next achievement for a user to try for?

Long version:

The objects are modeled like this in django (showing only important parts) :

class User(models.Model):
    alias = models.ForeignKey(Alias)

class Alias(models.Model):
    achievements = models.ManyToManyField('Achievement', through='Achiever')

class Achievement(models.Model):
    points = models.IntegerField()

class Achiever(models.Model):
    achievement = models.ForeignKey(Achievement)
    alias = models.ForeignKey(Alias)
    count = models.IntegerField(default=1)

and my algorithm is just to find every other user that has a shared achievement with the logged in user, and then go through all their achievements and sort by number of occurrences :

def recommended(request) :
    user = request.user.get_profile()

    // The final response
    r = {}

    // Get all the achievements the user's aliases have received 
    // in a set so they aren't double counted
    achievements = set()
    for alias in user.alias_set.select_related('achievements').all() :
        achievements.update(alias.achievements.all())

    // Find all other aliases that have gotten at least one of the same
    // same achievements as the user
    otherAliases = set()
    for ach in achievements :
        otherAliases.update(ach.alias_set.all())

    // Find other achievements the other users have gotten in addition to
    // the shared ones.
    // And count the number of times each achievement appears
    for otherAlias in otherAliases :
        for otherAch in otherAlias.achievements.all() :
            r[otherAch] = r.get(otherAch, 0) + 1

    // Remove all the achievements that the user has already gotten
    for ach in achievements :
        r.pop(ach)

    // Sort by number of times the achievements have been received
    r = sorted(r.items(), lambda x, y: cmp(x[1], y[1]), reverse=True)

    // Put in the template for showing on the screen
    template_values = {}
    template_values['achievements'] = r

But it takes FOREVER to run, and always returns the whole list, which is unneeded. A user would only need the top few achievements to go after.

So, I'm welcome to recommendations on other algorithms and/or code improvements. I'll give you an achievement in my system for coming up with the recommendation algorithm :)

like image 540
Paul Tarjan Avatar asked Jul 04 '09 08:07

Paul Tarjan


2 Answers

One method you can recommend which achievements to go for is to see how many of your users already have those achievements and recommend those popular ones. When they have achieved those you go down the list and recommend slightly less popular ones. However, this has a naive assumption that everyone wants to go for popular achievements. It might cause popular achievements to be even more popular and less popular ones, well... A consolation is that this doesn't take up much resources and is likely to run very fast. (Just keep a list of achievements + number of times it's achieved)

Another method (which attempts to guess which achievements the user is likely to go after based on what achievements he already had) is to use some machine learning algorithms. I think the k-nearest neighbor algorithm will perform quite well here. Select a threshold and just output everything that is above this threshold. Now, I don't know if this will run faster than what you already have, but you should just run the recommendation engine once every time the user has made a new achievement, store the top (let's say) five, and just output it back to the user whenever a recommendation is needed.

I hope this helps. =)

like image 79
wai Avatar answered Oct 26 '22 05:10

wai


I would suggest that you do the first three steps (achievements, otherAliases, count) as one single SQL statement. As it is now, you are issuing a lot of queries and summarising thousands of rows in Python which is a task you should delegate to the DB. For example the code

for otherAlias in otherAliases : #For every single other user
    for otherAch in otherAlias.achievements.all() : #execute a query
        r[otherAch] = r.get(otherAch, 0) + 1

Does thousands of huge queries.

Instead, you can use SQL to do this by joining Achiever on itself based on Alias id being different and achievement id being the same. You then group by achievement id and run a count.

In the query below, the table "B" is other user's achievements and "Achiever" is our achievements. If any other user shares an achievement, they appear once in "B" for each achievement they share. We then group those by alias_id and count the number of times they appeared so you get a nice id, count table out.

Very very rough code (no SQL available here)

SELECT B.Alias_id, COUNT(B.achievement_id) 
  FROM Achiever, Achiever as B 
  WHERE Achiever.achievement_id == B.achievement_id 
     AND Achiever.Alias_id == <insert current user alias here>;
  GROUP BY B.Alias_id

If that works the way I think it will, you will get a table of other user aliases, along with the number of achievements they share with the current user.

The next thing you do is an SQL statement that uses the one above as an "inner select" - call it users. You join that with your achievements table and your Achiever table for the current user. You might want to ignore all but the top 10 users who are similar to the current user.

I don't have time to write up a good query right now, but look at the JOIN statement for your DB that joins on achievement_id between the nominated 10 users and the current user - setting that id to NULL if it doesn't exist. The filter only to rows where it turned up NULL (unachieved achievements).

like image 2
Tom Leys Avatar answered Oct 26 '22 04:10

Tom Leys