Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collaborative Filtering adding new users and items

I'm working on building a recommendation engine for movies and have read a lot of good information that's out there. One thing I never see mentioned is how to make recommendations for new users and items. The normal process goes: I build my model and train it. I then input a user along with the top k recommendations I want returned for them.

Now, what if I want to do this for a user that was not in my initial sparse ratings matrix? If I have a sparse array of movie ratings for this new user, is there an easy way of incorporating it into the model without re-training the whole model again from scratch?

I know content-based filtering is used to solve the "cold-start" problem of CF. Is that my only option even if I have some ratings for this new user already?

Right now I am looking into Weighted Alternating Least Squares(WALS) and eventually I'll want to do this for SGD as well.

like image 811
Sogun Avatar asked Jul 28 '18 20:07

Sogun


People also ask

What is the difference between user based and item based collaborative filtering?

Item based filtering uses similarity between the items to determine whether a user would like it or not, whereas user based finds users with similar consumption patterns as yourself and gives you the content that these similar users found interesting.

Can you embed a user and item in the same space?

Collaborative filtering System: Collaborative does not need the features of the items to be given. Every user and item is described by a feature vector or embedding. It creates embedding for both users and items on its own. It embeds both users and items in the same embedding space.

What are the process taken by collaborative filtering?

In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).


2 Answers

I think what you are looking for is the answer how to fold-in a new item/user for the matrix factorization collaborative filtering. And this was already discussed here: How can I handle new users/items in model generated by Spark ALS from MLlib? with places where to find example solutions (with some code examples). It's for Spark ALS implementation, but the main idea stays the same.

like image 114
Bartłomiej Twardowski Avatar answered Oct 21 '22 23:10

Bartłomiej Twardowski


One thing I never see mentioned is how to make recommendations for new users and items.

This is also a difficult undertaking. In the case of a complete user cold start, additional data must be used to set the user in relation to other (already known) users in advance. Typical approaches use, for example, demographic data to cluster users in advance:

Safoury, Laila, and Akram Salah. "Exploiting user demographic attributes for solving cold-start problem in recommender system." Lecture Notes on Software Engineering 1.3 (2013): 303-307.

Basically, the trick when trying to make suggestions for complete new users is to describe them in terms of the features the algorithm has seen during training phase. The same applies to a complete Item Cold Start. Please note the difference between complete and partial cold start problems. The latter case describes the problem that "sufficient" information about a user/item has to be available.

is there an easy way of incorporating it into the model without re-training the whole model again from scratch?

Yes, there are actually attempts to achieve this. However, this is highly dependent on the factorization approach you are using. You can, for example, consider this paper:

Luo, Xin, Yunni Xia, and Qingsheng Zhu. "Incremental collaborative filtering recommender based on regularized matrix factorization." Knowledge-Based Systems 27 (2012): 271-280.

However, to the best of my knowledge, no implemented solution is available for Python.

Is that my only option even if I have some ratings for this new user already?

If you have few user ratings for individual users, it is often not necessary to use additional information for practical results. However, the results vary greatly depending on the method. In such a case the basic matrix factorization models (e.g., Koren and Bell) do not perform very well. Consider using Ranking-based MF approaches (e.g., LightFM - https://github.com/lyst/lightfm) which can, in addition, take content information into account.

like image 40
J-H Avatar answered Oct 21 '22 21:10

J-H