Apache Spark ALS Recommendation Rating values higher than range

I've ran a little ALS recommender system program as found on the Apache Spark website which utilises MLlib. When using a dataset with ratings of 1-5 (I've used the MovieLens dataset) it gives recommendations with predicted ratings of over 5 !

The highest I've found in my small testing is 7.4. Obviously, I am either misunderstanding what the code is meant to do, or something has gone awry. I have researched into Latent Factor Recommender Systems and was under the impression that the Spark Mlib ALS implementation was based on this one.

Why would it return ratings higher than what is possible? It makes no sense.

Have I misunderstood the algorithm or is the program flawed?

What is rank in ALS?

rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.

What do you think are the parameters of the ALS model?

Most important hyper-params in Alternating Least Square (ALS): maxIter: the maximum number of iterations to run (defaults to 10) rank: the number of latent factors in the model (defaults to 10) regParam: the regularization parameter in ALS (defaults to 1.0)

What is ALS recommender system?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.

What is Alpha in ALS?

alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0). nonnegative specifies whether or not to use nonnegative constraints for least squares (defaults to false ).

You're looking at the right paper, but, I think you are expecting the algorithm to do something it is not intended to do. It is producing a low-rank approximation to your input as the product of two matrices, but nothing about multiplying matrices clamps the output values.

You can clamp, or round the values. You may not want it to because you're getting extra info about how much stronger than 5 the predicted rating is. I suppose it's also not technically possible for the algorithm to assume that the maximum possible value is the max observed value in the input.

Apache Spark ALS Recommendation Rating values higher than range

Tags:

machine-learning

apache-spark

apache-spark-mllib

collaborative-filtering

monster

People also ask

1 Answers

Sean Owen

Recent Activity

Donate For Us

Apache Spark ALS Recommendation Rating values higher than range

Tags:

machine-learning

apache-spark

apache-spark-mllib

collaborative-filtering

monster

People also ask

1 Answers

Sean Owen

Related questions

Recent Activity

Donate For Us