Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommendation algorithm (and implementation) for finding similar items and users

I have a database of about 700k users along with items they have watched/listened to/read/bought/etc. I would like to build a recommendation engine that recommends new items based on what users with similar taste in things have enjoyed, as well as actually finding people the user might want to be friends with on a social network I'm building (similar to last.fm).

My requirements are as follows:

  • Majority of the "users" in my database aren't actually users of my website. They have been data mined from third-party sources. However, when recommending users, I would like to limit the search to people who are members of my website (while still taking advantage of the bigger data set).
  • I need to take multiple items into consideration. Not "people who like this one item you enjoyed...", but "people who like most of the items you enjoyed...".
  • I need to compute similarities between users and show them when viewing their profiles (taste-o-meter).
  • Some items are rated, others are not. Ratings are from 1-10, not boolean values. In most cases it would be possible to deduct a rating value from other stats if it's not present (e.g. if the user has favourited an item, but hasn't rated it, I could just assume a rating of 9).
  • It has to interact with Python code in one way or another. Preferably, it should use a seperate (possibly NoSQL) database and expose an API to use in my web back-end. The project I'm making uses Pyramid and SQLAlchemy.
  • I would like to take item genres into account.
  • I would like to display similar items on item pages based on both its genre (possibly tags) and what users who enjoyed the item liked (like Amazon's "people who bought this item" and Last.fm artist pages). Items from different genres should still be shown, but have a lower similarity value.
  • I would prefer a well-documented implementation of an algorithm with some examples.

Please don't give an answer like "use pysuggest or mahout", since those implement a plethora of algorithms and I'm looking for one that's most suitable for my data/use. I've been interested in Neo4j and how it all could be expressed as a graph of connections between users and items.

like image 226
vomitcuddle Avatar asked Jan 19 '12 19:01

vomitcuddle


People also ask

Which algorithm is used for recommendation system?

Collaborative filtering (CF) and its modifications is one of the most commonly used recommendation algorithms. Even data scientist beginners can use it to build their personal movie recommender system, for example, for a resume project.

What are recommendation algorithms with examples?

Netflix, YouTube, Tinder, and Amazon are all examples of recommender systems in use. The systems entice users with relevant suggestions based on the choices they make. Recommender systems can also enhance experiences for: News Websites.

How do you write a recommendation for an algorithm?

To experiment with recommendation algorithms, you'll need data that contains a set of items and a set of users who have reacted to some of the items. The reaction can be explicit (rating on a scale of 1 to 5, likes or dislikes) or implicit (viewing an item, adding it to a wish list, the time spent on an article).


1 Answers

To determine similarity between users you can run cosine or pearson similarity (Found in Mahout and everywhere on the net really!) across the user vector. So your data representation should look something like

 u1  [1,2,3,4,5,6] 
 u2  [35,24,3,4,5,6] 
 u1  [35,3,9,2,1,11] 

In the point where you want to take multiple items into consideration you can use the above to determine how similar someones profiles are. The higher the correlation score the likelihood they have very similar items is. You can set a threshold so someone with .75 similarity has a similar set of items in their profile.

Where you are missing values you can of course make up your own values. I'd just keep them binary and try to blend the various different algorithms. That's called an ensemble.

Overall you are looking for something called item based collaborative filtering as the recommendation aspect of your set up and also used to identify similar items. It's a standard recommendation algorithm that does pretty much everything you've asked for.

When trying to find similar users you can perform some type of similarity metric across your user vectors.

Regarding Python, the book called programming in collective intelligence gives all their samples in python so go pick up a copy and read chapter 1.

Representing all of this as a graph will be somewhat problamatic as your undying representation is a Bipartile Graph. There are lots of recommendation approaches out there that use a graph based approach but its generally not the best performing approach.

like image 142
Steve Avatar answered Nov 11 '22 19:11

Steve