I've ran a little ALS recommender system program as found on the Apache Spark website which utilises MLlib. When using a dataset with ratings of 1-5 (I've used the MovieLens dataset) it gives recommendations with predicted ratings of over 5 !
The highest I've found in my small testing is 7.4. Obviously, I am either misunderstanding what the code is meant to do, or something has gone awry. I have researched into Latent Factor Recommender Systems and was under the impression that the Spark Mlib ALS implementation was based on this one.
Why would it return ratings higher than what is possible? It makes no sense.
Have I misunderstood the algorithm or is the program flawed?
rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.
Most important hyper-params in Alternating Least Square (ALS): maxIter: the maximum number of iterations to run (defaults to 10) rank: the number of latent factors in the model (defaults to 10) regParam: the regularization parameter in ALS (defaults to 1.0)
ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.
alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0). nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false ).
You're looking at the right paper, but, I think you are expecting the algorithm to do something it is not intended to do. It is producing a low-rank approximation to your input as the product of two matrices, but nothing about multiplying matrices clamps the output values.
You can clamp, or round the values. You may not want it to because you're getting extra info about how much stronger than 5 the predicted rating is. I suppose it's also not technically possible for the algorithm to assume that the maximum possible value is the max observed value in the input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With