Spark MLlib - Collaborative Filtering Implicit Feed

Tags:

So I am building an implicit feedback recommender model with Spark 1.0.0 and I am trying to follow the example they have on their collaborative filtering page: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback

And I even have the test dataset loaded up which they reference in the example: http://codesearch.ruethschilling.info/xref/apache-foundation/spark/mllib/data/als/test.data

However when I try to run the implicit feedback model: val alpha = 0.01 val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

(the ratings were the ratings exactly from their dataset and rank = 10, numIterations = 20) I am getting the following error:

scala> val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
<console>:26: error: overloaded method value trainImplicit with alternatives:
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double,seed: Long)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating], Int, Int, Double)
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

Interestingly, this model runs just fine when NOT doing trainImplicit (i.e. ALS.train)

905

asked Sep 03 '14 16:09

atellez

1 Answers

The example seems to be out of sync with the implementation, as there are no overloads of trainImplicit with four parameters -- which is what the error message is telling you. However, if you look at the Scala source code for ALS you'll see that the three parameter overload is implemented in terms of the six parameter overload via some 'magic numbers':

def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)
    : MatrixFactorizationModel = {
    trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
}

This suggests that 0.01 is a decent default value for lambda. (Perhaps good to check with someone having a deeper understanding of ML.) This may give you enough information to put together a reasonable call to the five or six parameter overload. (Of course, if you know enough to pick better values, that's great!)

For example:

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, alpha)

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, -1, alpha)

Finally, you may not realize that there is pretty decent API documentaiton for ALS.

143

answered Sep 28 '22 12:09

Spiro Michaylov

Related questions
                            
                                Custom source/sink configurations not getting recognized
                            
                                Work with Jupyter on Windows and Apache Toree Kernel for Spark compatibility
                            
                                pass custom exitcode from yarn-cluster mode spark to CLI
                            
                                Is there a way to connecto Spark-Sql with sqlalchemy
                            
                                Uima Ruta Out of Memory issue in spark context
                            
                                how to calculate aggregations on a window when sensor readings are not sent if they haven't changed since last event?
                            
                                Using python lime as a udf on spark
                            
                                UDF not working in Spark SQL
                            
                                Spark Streaming with a dynamic lookup table
                            
                                Object spark is not a member of package org
                            
                                How to get a spark job's metrics?
                            
                                Is this a bug of spark stream or memory leak?
                            
                                PySpark s3 Access with Multiple AWS Credential Profiles?
                            
                                What to use to have graphical view of Spark's memory usage (with YARN)?
                            
                                Apache Spark sort partition by user ID and write each partition to CSV
                            
                                Why does sbt assembly fail with "Not a valid command: assembly"?
                            
                                Lost executor Spark
                            
                                PySpark: Numpy memory not being released in executor map-partition function (memory leak)
                            
                                Joining Spark DataFrames on a nearest key condition
                            
                                I cannot use --package option on bitnami/spark docker container

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark MLlib - Collaborative Filtering Implicit Feed

Tags:

apache-spark

recommendation-engine

atellez

People also ask

1 Answers

Spiro Michaylov

Recent Activity

Donate For Us