Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create my own recommendation engine? [closed]

I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.

  1. Where do (should be implemented on database side or backend side) I start creating recommendation engine?
  2. What level of database knowledge will be needed?
  3. Is there any open source ones that can be used for help or any resource?
  4. What should be the first steps that I have to do?
like image 923
Burak Dede Avatar asked Sep 10 '09 21:09

Burak Dede


People also ask

How do you make a recommendation engine?

To build a system that can automatically recommend items to users based on the preferences of other users, the first step is to find similar users or items. The second step is to predict the ratings of the items that are not yet rated by a user.

How long does it take to build a recommender system?

According to the company' blog post, the first tuning and training of the model take about five days, before it can actually begin to recommend products for customers.

What makes a good recommendation engine?

There are three main types of recommendation engines: collaborative filtering, content-based filtering – and a hybrid of the two. Collaborative filtering focuses on collecting and analyzing data on user behavior, activities, and preferences, to predict what a person will like, based on their similarity to other users.


2 Answers

Presenting recommendations can be split up in to two main sections:

  1. Feature extraction
  2. Recommendation

Feature extraction is very specific to the object being recommended. For music, for example, some features of the object might be the frequency response of the song, the power, the genre, etc. The features for the users might be age, location, etc. You then create a vector for each user and song with the various elements of the vector corresponding to different features of interest.

Performing the actual recommendation only requires well thought out feature vectors. Note that if you don't choose the right features your recommendation engine will fail. This would be like asking you to tell me my sex based on my age. Of course my age may provide a bit of information, but I think you could imagine better questions to ask. Anyways, once you have your feature vectors for each user and song, you will need to train the recommendation engine. I think the best way to do this would be to get a whole bunch of users to take your demographic test and then tell you specific songs that they like. At this point you have all the information you need. Your job is to draw a decision boundary with the information you have. Consider a simple example. You want to predict whether or not a user likes AC/DC's "Back in Black" based on age and sex. Imagine a graph showing 100 data points. The x axis is age, the y axis is sex (1 is male, 2 is female). A black mark indicates that the user likes the song while a red mark means they don't like the song. My guess is that this graph might have a lot of black marks corresponding to users that are male and between the ages of 12 and 37 while the rest of the marks will be red. So, if we were to manually select a decision boundary, it'd be a rectangle around this area holding the majority of the black marks. This is called the decision boundary because, if a completely new person comes to you and tells you their age and sex, you only have to plot them on the graph and ask whether or not they fall within that box.

So, the hard part here is finding the decision boundary. The good news is that you don't need to know how to do that. You just need to know how to use some of the common tools. You can look into using neural networks, support vector machines, linear classifiers, etc. Again, don't let the big names fool you. Most people can't tell you what these things are really doing. They just know how to plug things in and get results.

I know it's a bit late, but I hope this helps anyone that stumbles on this thread.

like image 90
Josh Avatar answered Sep 20 '22 14:09

Josh


I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:

  • Who uploaded a video?
  • Who commented on a video?
  • Which tags where created?
  • Who visited the video? (also tracking anonymous visitors)
  • Who favorited a video?
  • Who rated a video?
  • Which channels was the video assigned to?
  • Text streams of title, description, tags, channels and comments are collected by a fulltext indexer which puts weight on each of the data sources.

Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:

  • Similar videos by fulltext search
  • Videos uploaded by the same user
  • Other videos the users from these comments also commented on
  • Other videos the users from these favorites also favorited
  • Other videos the raters from these ratings also rated on (weighted)
  • Other videos in the same channels
  • Other videos with the same tags (weighted by "expressiveness" of tags)
  • Other videos played by people who played this video (XY latest plays)
  • Similar videos by comments fulltext
  • Similar videos by title fulltext
  • Similar videos by description fulltext
  • Similar videos by tags fulltext

All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.

I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.

To give a small hint, in ruby code it looks like this:

def related_by_tags   tag_names.find(:all, :include => :videos).inject([]) { |result,t|     result + t.video_ids.map { |v|       [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]     }   } end 

I would be interested on how other people solve such algorithms.

like image 43
hurikhan77 Avatar answered Sep 17 '22 14:09

hurikhan77