Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark MLLib Collaborative Filtering with new user

I'm trying out the Collaborative Filtering algorithm implemented in Spark and am running into the following issue:

Suppose I train a model with the following data:

u1|p1|3
u1|p2|3
u2|p1|2
u2|p2|3

Now if I test it with the following data:

u1|p1|1
u3|p1|2
u3|p2|3

I never see any ratings for the user 'u3', presumably because that user does not appear in the training data. Is this because of the cold start issue? I was under the impression that this issue would apply only to a new product. In this case, I would have expected a prediction for 'u3' since 'u1' and 'u2' in the training data have similar rating information to 'u3'. Is this the distinction between model-based and memory-based collaborative filtering?

like image 523
Navin Viswanath Avatar asked Oct 31 '22 08:10

Navin Viswanath


1 Answers

I assume you are talking about the ALS algorithm?

'u3' is not pair of your training set and therefore your model does not know anything about that user. All one could to is maybe return the mean rating over all users.

Looking into the Spark 1.3.0 Scala code: The MatrixFactorizationModel returned by ALS.train() tries to lookup user and product in the feature vectors when you call predict(). I get a NoSuchElementException when I try to predict a rating of an unknown user. It is just implemented that way.

like image 109
stholzm Avatar answered Nov 15 '22 11:11

stholzm