I'm trying out the Collaborative Filtering algorithm implemented in Spark and am running into the following issue:
Suppose I train a model with the following data:
u1|p1|3
u1|p2|3
u2|p1|2
u2|p2|3
Now if I test it with the following data:
u1|p1|1
u3|p1|2
u3|p2|3
I never see any ratings for the user 'u3', presumably because that user does not appear in the training data. Is this because of the cold start issue? I was under the impression that this issue would apply only to a new product. In this case, I would have expected a prediction for 'u3' since 'u1' and 'u2' in the training data have similar rating information to 'u3'. Is this the distinction between model-based and memory-based collaborative filtering?
I assume you are talking about the ALS algorithm?
'u3' is not pair of your training set and therefore your model does not know anything about that user. All one could to is maybe return the mean rating over all users.
Looking into the Spark 1.3.0 Scala code: The MatrixFactorizationModel
returned by ALS.train()
tries to lookup user and product in the feature vectors when you call predict()
. I get a NoSuchElementException
when I try to predict a rating of an unknown user. It is just implemented that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With