How to create my own recommendation engine? [closed]

Tags:

I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.

Where do (should be implemented on database side or backend side) I start creating recommendation engine?
What level of database knowledge will be needed?
Is there any open source ones that can be used for help or any resource?
What should be the first steps that I have to do?

923

asked Sep 10 '09 21:09

Burak Dede

2 Answers

Presenting recommendations can be split up in to two main sections:

Feature extraction
Recommendation

Feature extraction is very specific to the object being recommended. For music, for example, some features of the object might be the frequency response of the song, the power, the genre, etc. The features for the users might be age, location, etc. You then create a vector for each user and song with the various elements of the vector corresponding to different features of interest.

Performing the actual recommendation only requires well thought out feature vectors. Note that if you don't choose the right features your recommendation engine will fail. This would be like asking you to tell me my sex based on my age. Of course my age may provide a bit of information, but I think you could imagine better questions to ask. Anyways, once you have your feature vectors for each user and song, you will need to train the recommendation engine. I think the best way to do this would be to get a whole bunch of users to take your demographic test and then tell you specific songs that they like. At this point you have all the information you need. Your job is to draw a decision boundary with the information you have. Consider a simple example. You want to predict whether or not a user likes AC/DC's "Back in Black" based on age and sex. Imagine a graph showing 100 data points. The x axis is age, the y axis is sex (1 is male, 2 is female). A black mark indicates that the user likes the song while a red mark means they don't like the song. My guess is that this graph might have a lot of black marks corresponding to users that are male and between the ages of 12 and 37 while the rest of the marks will be red. So, if we were to manually select a decision boundary, it'd be a rectangle around this area holding the majority of the black marks. This is called the decision boundary because, if a completely new person comes to you and tells you their age and sex, you only have to plot them on the graph and ask whether or not they fall within that box.

So, the hard part here is finding the decision boundary. The good news is that you don't need to know how to do that. You just need to know how to use some of the common tools. You can look into using neural networks, support vector machines, linear classifiers, etc. Again, don't let the big names fool you. Most people can't tell you what these things are really doing. They just know how to plug things in and get results.

I know it's a bit late, but I hope this helps anyone that stumbles on this thread.

answered Sep 20 '22 14:09

Josh

I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:

Who uploaded a video?
Who commented on a video?
Which tags where created?
Who visited the video? (also tracking anonymous visitors)
Who favorited a video?
Who rated a video?
Which channels was the video assigned to?
Text streams of title, description, tags, channels and comments are collected by a fulltext indexer which puts weight on each of the data sources.

Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:

Similar videos by fulltext search
Videos uploaded by the same user
Other videos the users from these comments also commented on
Other videos the users from these favorites also favorited
Other videos the raters from these ratings also rated on (weighted)
Other videos in the same channels
Other videos with the same tags (weighted by "expressiveness" of tags)
Other videos played by people who played this video (XY latest plays)
Similar videos by comments fulltext
Similar videos by title fulltext
Similar videos by description fulltext
Similar videos by tags fulltext

All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.

I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.

To give a small hint, in ruby code it looks like this:

def related_by_tags   tag_names.find(:all, :include => :videos).inject([]) { |result,t|     result + t.video_ids.map { |v|       [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]     }   } end

I would be interested on how other people solve such algorithms.

answered Sep 17 '22 14:09

hurikhan77

Related questions
                            
                                Show all tables inside a MySQL database using PHP?
                            
                                MySQL equivalent of DECODE function in Oracle
                            
                                How to add static value when doing INSERT INTO with SELECT in a MySQL query?
                            
                                Is there a performance decrease if there are too many columns in a table?
                            
                                SQL best practice to deal with default sort order
                            
                                Import and export schema in cassandra
                            
                                How to find number of rows in heroku database
                            
                                Saving credit card information in MySQL database? [closed]
                            
                                why would you use WHERE 1=0 statement in SQL?
                            
                                FactoryGirl + Faker - same data being generated for every object in db seed data
                            
                                String, decimal, or float datatype for price field?
                            
                                Update xampp from maria db 10.1 to 10.2
                            
                                How do I load a sql.gz file to my database? (importing)
                            
                                What Are the Pros and Cons of Filemaker? [closed]
                            
                                How do I get the ID of multiple inserted rows in MySQL?
                            
                                What is the "best" way to store international addresses in a database?
                            
                                How do I create a local database inside of Microsoft SQL Server 2014?
                            
                                Paxos vs two phase commit
                            
                                Strategy for Offline/Online data synchronization
                            
                                What are PostgreSQL RULEs good for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create my own recommendation engine? [closed]

Tags:

database

recommendation-engine

collective-intelligence

Burak Dede

People also ask

2 Answers

Josh

hurikhan77

Recent Activity

Donate For Us