Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark ALS Recommendation Rating values higher than range

I've ran a little ALS recommender system program as found on the Apache Spark website which utilises MLlib. When using a dataset with ratings of 1-5 (I've used the MovieLens dataset) it gives recommendations with predicted ratings of over 5 !

The highest I've found in my small testing is 7.4. Obviously, I am either misunderstanding what the code is meant to do, or something has gone awry. I have researched into Latent Factor Recommender Systems and was under the impression that the Spark Mlib ALS implementation was based on this one.

Why would it return ratings higher than what is possible? It makes no sense.

Have I misunderstood the algorithm or is the program flawed?

like image 408
monster Avatar asked Mar 14 '15 16:03

monster


People also ask

What is rank in ALS?

rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.

What do you think are the parameters of the ALS model?

Most important hyper-params in Alternating Least Square (ALS): maxIter: the maximum number of iterations to run (defaults to 10) rank: the number of latent factors in the model (defaults to 10) regParam: the regularization parameter in ALS (defaults to 1.0)

What is ALS recommender system?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.

What is Alpha in ALS?

alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0). nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false ).


1 Answers

You're looking at the right paper, but, I think you are expecting the algorithm to do something it is not intended to do. It is producing a low-rank approximation to your input as the product of two matrices, but nothing about multiplying matrices clamps the output values.

You can clamp, or round the values. You may not want it to because you're getting extra info about how much stronger than 5 the predicted rating is. I suppose it's also not technically possible for the algorithm to assume that the maximum possible value is the max observed value in the input.

like image 121
Sean Owen Avatar answered Oct 17 '22 04:10

Sean Owen