Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark MLlib - Collaborative Filtering Implicit Feed

So I am building an implicit feedback recommender model with Spark 1.0.0 and I am trying to follow the example they have on their collaborative filtering page: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback

And I even have the test dataset loaded up which they reference in the example: http://codesearch.ruethschilling.info/xref/apache-foundation/spark/mllib/data/als/test.data

However when I try to run the implicit feedback model: val alpha = 0.01 val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

(the ratings were the ratings exactly from their dataset and rank = 10, numIterations = 20) I am getting the following error:

scala> val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
<console>:26: error: overloaded method value trainImplicit with alternatives:
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double,seed: Long)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating], Int, Int, Double)
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

Interestingly, this model runs just fine when NOT doing trainImplicit (i.e. ALS.train)

like image 905
atellez Avatar asked Sep 03 '14 16:09

atellez


People also ask

Is ALS collaborative filtering?

Alternating Least Square (ALS) is also a matrix factorization algorithm and it runs itself in a parallel fashion. ALS is implemented in Apache Spark ML and built for a larges-scale collaborative filtering problems.

What is regParam in ALS?

regParam specifies the regularization parameter in ALS (defaults to 1.0). implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).

What is rank in ALS?

rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.

What is ALS recommender system?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.


1 Answers

The example seems to be out of sync with the implementation, as there are no overloads of trainImplicit with four parameters -- which is what the error message is telling you. However, if you look at the Scala source code for ALS you'll see that the three parameter overload is implemented in terms of the six parameter overload via some 'magic numbers':

def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)
    : MatrixFactorizationModel = {
    trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
}

This suggests that 0.01 is a decent default value for lambda. (Perhaps good to check with someone having a deeper understanding of ML.) This may give you enough information to put together a reasonable call to the five or six parameter overload. (Of course, if you know enough to pick better values, that's great!)

For example:

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, alpha)

or

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, -1, alpha)

Finally, you may not realize that there is pretty decent API documentaiton for ALS.

like image 143
Spiro Michaylov Avatar answered Sep 28 '22 12:09

Spiro Michaylov