How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

Tags:

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?

507

asked Sep 28 '17 06:09

Dimitris Poulopoulos

1 Answers

To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.

What is implicit feedback ?

In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.

Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.

Do same evaluating techniques apply here? Such as RMSE, MSE.

It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.

Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.

However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.

So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.

Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.

Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.

For more information about this metric, I advise you to read the paper stated above.

Ok, now we know what we are going to use but what about Apache Spark?

Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator for spark-ml.

The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.

I hope this answers your question.

answered Sep 29 '22 23:09

eliasah

Related questions
                            
                                reduceByKey method not being found in IntelliJ
                            
                                PySpark count values by condition
                            
                                Spark Job Keep on Running
                            
                                How to set spark.local.dir property from spark shell?
                            
                                GroupByKey and create lists of values pyspark sql dataframe
                            
                                How to transform Spark Dataframe columns to a single column of a string array
                            
                                How to unpack multiple keys in a Spark DataSet
                            
                                Does Apache Spark SQL support MERGE clause?
                            
                                How do you display Dataframe column names sorted?
                            
                                Cumulative sum in Spark
                            
                                How to use approxQuantile by group?
                            
                                How to set jdbc/partitionColumn type to Date in spark 2.4.1
                            
                                Hbase 0.96 with Spark v 1.0+
                            
                                Writing a RDD to a csv
                            
                                Spark getting keys from key-value RDD
                            
                                How to fix "MetadataFetchFailedException: Missing an output location for shuffle"?
                            
                                Spark 2.0.0 Arrays.asList not working - incompatible types
                            
                                PySpark DataFrame - Join on multiple columns dynamically
                            
                                pyspark createdataframe: string interpreted as timestamp, schema mixes up columns
                            
                                Pyspark Removing null values from a column in dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

Tags:

apache-spark

apache-spark-mllib

Dimitris Poulopoulos

People also ask

1 Answers

eliasah

Recent Activity

Donate For Us