How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?
We then train an ALS model which assumes, by default, that the ratings are explicit ( implicitPrefs is false ). We evaluate the recommendation model by measuring the root-mean-square error of rating prediction. Refer to the ALS Scala docs for more details on the API.
ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.
Implicit Data Collection. In order to make any recommendations, the system has to collect data. The ultimate goal of collection the data is to get an idea of user preferences, which can later be used to make predictions on future user preferences.
rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.
To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.
What is implicit feedback ?
In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.
Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.
Do same evaluating techniques apply here? Such as RMSE, MSE.
It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.
Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.
However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.
So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.
Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.
Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.
For more information about this metric, I advise you to read the paper stated above.
Ok, now we know what we are going to use but what about Apache Spark?
Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator
for spark-ml
.
The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.
I hope this answers your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With