Collaborative Filtering adding new users and items

Tags:

I'm working on building a recommendation engine for movies and have read a lot of good information that's out there. One thing I never see mentioned is how to make recommendations for new users and items. The normal process goes: I build my model and train it. I then input a user along with the top k recommendations I want returned for them.

Now, what if I want to do this for a user that was not in my initial sparse ratings matrix? If I have a sparse array of movie ratings for this new user, is there an easy way of incorporating it into the model without re-training the whole model again from scratch?

I know content-based filtering is used to solve the "cold-start" problem of CF. Is that my only option even if I have some ratings for this new user already?

Right now I am looking into Weighted Alternating Least Squares(WALS) and eventually I'll want to do this for SGD as well.

811

asked Jul 28 '18 20:07

Sogun

2 Answers

I think what you are looking for is the answer how to fold-in a new item/user for the matrix factorization collaborative filtering. And this was already discussed here: How can I handle new users/items in model generated by Spark ALS from MLlib? with places where to find example solutions (with some code examples). It's for Spark ALS implementation, but the main idea stays the same.

114

answered Oct 21 '22 23:10

Bartłomiej Twardowski

One thing I never see mentioned is how to make recommendations for new users and items.

This is also a difficult undertaking. In the case of a complete user cold start, additional data must be used to set the user in relation to other (already known) users in advance. Typical approaches use, for example, demographic data to cluster users in advance:

Safoury, Laila, and Akram Salah. "Exploiting user demographic attributes for solving cold-start problem in recommender system." Lecture Notes on Software Engineering 1.3 (2013): 303-307.

Basically, the trick when trying to make suggestions for complete new users is to describe them in terms of the features the algorithm has seen during training phase. The same applies to a complete Item Cold Start. Please note the difference between complete and partial cold start problems. The latter case describes the problem that "sufficient" information about a user/item has to be available.

is there an easy way of incorporating it into the model without re-training the whole model again from scratch?

Yes, there are actually attempts to achieve this. However, this is highly dependent on the factorization approach you are using. You can, for example, consider this paper:

Luo, Xin, Yunni Xia, and Qingsheng Zhu. "Incremental collaborative filtering recommender based on regularized matrix factorization." Knowledge-Based Systems 27 (2012): 271-280.

However, to the best of my knowledge, no implemented solution is available for Python.

Is that my only option even if I have some ratings for this new user already?

If you have few user ratings for individual users, it is often not necessary to use additional information for practical results. However, the results vary greatly depending on the method. In such a case the basic matrix factorization models (e.g., Koren and Bell) do not perform very well. Consider using Ranking-based MF approaches (e.g., LightFM - https://github.com/lyst/lightfm) which can, in addition, take content information into account.

answered Oct 21 '22 21:10

J-H

Related questions
                            
                                Binary Crossentropy to penalize all components of one-hot vector
                            
                                Is it possible to certify an AI-based solution for safety-critical systems? [closed]
                            
                                Least Squares method in practice
                            
                                Deep Learning an Imbalanced data set
                            
                                How to add a regression head after the fully connected layer in convolutional network using Tensorflow?
                            
                                Does CrossValidator in PySpark distribute the execution?
                            
                                Machine learning - normalizing features with no theoretical maximum value
                            
                                ValueError: X.shape[1] = 15 should be equal to 700, the number of features at training time
                            
                                NLP - Embeddings selection of `start` and `end` of sentence tokens
                            
                                Why does this neural network learn nothing?
                            
                                Training GAN on small dataset of images
                            
                                Keras - model.predict return classes and not probabilities
                            
                                Log Loss function in pyspark
                            
                                Keyword/keyphrase extraction from text [closed]
                            
                                Get the positive and negative words from a Textblob based on its polarity in Python (Sentimental analysis)
                            
                                Selecting a Specific Number of Features via Sklearn's RFECV (Recursive Feature Elimination with Cross-validation)
                            
                                Using keras tokenizer for new words not in training set
                            
                                How to use K.get_session in Tensorflow 2.0 or how to migrate it?
                            
                                How does data normalization work in keras during prediction?
                            
                                What is a weak learner?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Collaborative Filtering adding new users and items

Tags:

machine-learning

collaborative-filtering

recommendation-engine

Sogun

People also ask

2 Answers

Bartłomiej Twardowski

J-H

Recent Activity

Donate For Us