Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing recommendation system for unsupervised learning

I have been looking at papers and books about recommendation systems and the approaches suggested to build them. In many of them the Netflix competition was given as an example. On Netflix users rate movies (from 1 to 5). In that competition, the competitors were given a database of movies and corresponding ratings by the users, and they were supposed to implement a system which would best predict the rating of the movies and using that rating would suggest movies to the users.

For evaluation they suggest cross validation using measures which use the predicted and real ratings as arguments. Predicted rating is calculated using the history of the user and his ratings for the movies.

I am trying to build a news recommendation system. The problem I am facing now is that the news are relevant just for a short time and almost nobody would give a rating to the news. So, I only have implicit feedback (views) and no explicit feedback (rating). Also in the Netflix problem they are provided with a database. I am wondering how to cope with the cold start problem, because at the start no news would be read (viewed).

I will be so thankful if You could suggest me how to avoid the cold start problem and once I will have an algorithm how could I test if it works fine.

Thank you!

like image 703
giliev Avatar asked Oct 20 '22 18:10

giliev


2 Answers

Movies are an excellent use case for classic collaborative filtering: they're items people are interested in for a long time, there are relatively few of them, many people have overlapping interests, and star ratings make sense. News stories are completely different. Rather than collaborative filtering, look at content-based filtering. That's where people's interests align with content identifiers (which could be keywords about the news story, or the publisher, or metadata about time of day or region of the world). View counts are your best bet for information about people's preferences, and they also allow you to use some data mining techniques like association rule mining.

While you'll still have the user cold start problem -- where a new user in your system has given you no information about her preferences, unless you bootstrap it from mining her tweets or Facebook interests or something of the sort -- you can avoid the item cold start problem. Instead of relying on news stories read through your community as the only way to get item similarities, you can use another corpus. In particular, try Wikipedia, and check out WikiBrain (https://github.com/shilad/wikibrain). That's an API through which you can get the similarity of one concept to another, and apply it to your recommendation needs.

like image 102
Dan Jarratt Avatar answered Oct 27 '22 15:10

Dan Jarratt


To get started with this project you're undertaking, I would suggest clustering for finding the pattern on news that are relevant/popular items. The more features that you incorporate in such a way that it adds value to your results (this part needs careful selection, study and statistical analysis).

For news recommendation - you can have layered approach, so let's say first part would be scan articles that are 'positive'/contain certain keywords from people that commented on that article.

Then perhaps the second layered approach would be to cross reference twitter's response to that article, to facebook's like/traffic, to how many pinterest user's pin that article, etc...

You might also check trending keywords from google, bing, etc... on particular topics so that's how to ensure that the article you are showing is 'relevant'

I also suggest starting small cause there are so many articles in the web - maybe look into focusing on one topic and then generalize it. As you may notice, an 'articles' popularity is kinda linked to certain voices that people follow so that's another way of finding the relevance of that article.

Here's more info on unsupervised learning: http://en.wikipedia.org/wiki/Unsupervised_learning

You might also want to look into Expectation Maximization to find which variables would improve the unobserved data you've obtained. Here's a full explanation of EM https://stats.stackexchange.com/questions/72774/numerical-example-to-understand-expectation-maximization

like image 29
macmania314 Avatar answered Oct 27 '22 17:10

macmania314