Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set preferences for ALS implicit feedback in Collaborative Filtering?

I am trying to use Spark MLib ALS with implicit feedback for collaborative filtering. Input data has only two fields userId and productId. I have no product ratings, just info on what products users have bought, that's all. So to train ALS I use:

def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int): MatrixFactorizationModel

(http://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS$)

This API requires Rating object:

Rating(user: Int, product: Int, rating: Double)

On the other hand documentation on trainImplicit tells: Train a matrix factorization model given an RDD of 'implicit preferences' ratings given by users to some products, in the form of (userID, productID, preference) pairs.

When I set rating / preferences to 1 as in:

val ratings = sc.textFile(new File(dir, file).toString).map { line =>
  val fields = line.split(",")
  // format: (randomNumber, Rating(userId, productId, rating))
  (rnd.nextInt(100), Rating(fields(0).toInt, fields(1).toInt, 1.0))
}

 val training = ratings.filter(x => x._1 < 60)
  .values
  .repartition(numPartitions)
  .cache()
val validation = ratings.filter(x => x._1 >= 60 && x._1 < 80)
  .values
  .repartition(numPartitions)
  .cache()
val test = ratings.filter(x => x._1 >= 80).values.cache()

And then train ALSL:

 val model = ALS.trainImplicit(ratings, rank, numIter)

I get RMSE 0.9, which is a big error in case of preferences taking 0 or 1 value:

val validationRmse = computeRmse(model, validation, numValidation)

/** Compute RMSE (Root Mean Squared Error). */
 def computeRmse(model: MatrixFactorizationModel, data: RDD[Rating], n: Long): Double = {
val predictions: RDD[Rating] = model.predict(data.map(x => (x.user, x.product)))
val predictionsAndRatings = predictions.map(x => ((x.user, x.product), x.rating))
  .join(data.map(x => ((x.user, x.product), x.rating)))
  .values
math.sqrt(predictionsAndRatings.map(x => (x._1 - x._2) * (x._1 - x._2)).reduce(_ + _) / n)
}

So my question is: to what value should I set rating in:

Rating(user: Int, product: Int, rating: Double)

for implicit training (in ALS.trainImplicit method) ?

Update

With:

  val alpha = 40
  val lambda = 0.01

I get:

Got 1895593 ratings from 17471 users on 462685 products.
Training: 1136079, validation: 380495, test: 379019
RMSE (validation) = 0.7537217888106758 for the model trained with rank = 8 and numIter = 10.
RMSE (validation) = 0.7489005441881798 for the model trained with rank = 8 and numIter = 20.
RMSE (validation) = 0.7387672873747732 for the model trained with rank = 12 and numIter = 10.
RMSE (validation) = 0.7310003522283959 for the model trained with rank = 12 and numIter = 20.
The best model was trained with rank = 12, and numIter = 20, and its RMSE on the test set is 0.7302343904091481.
baselineRmse: 0.0 testRmse: 0.7302343904091481
The best model improves the baseline by -Infinity%.

Which is still a big error, I guess. Also I get strange baseline improvement where baseline model is simply mean (1).

like image 508
zork Avatar asked Dec 26 '14 15:12

zork


People also ask

What is ALS in collaborative filtering?

Apache Spark ML implements alternating least squares (ALS) for collaborative filtering, a very popular algorithm for making recommendations. ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR).

What do you think are the parameters of the ALS model?

Most important hyper-params in Alternating Least Square (ALS): maxIter: the maximum number of iterations to run (defaults to 10) rank: the number of latent factors in the model (defaults to 10) regParam: the regularization parameter in ALS (defaults to 1.0)

What is regularization parameter in ALS?

lambda specifies the regularization parameter in ALS. implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data. alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations.

What is ALS algorithm?

Description. The alternating least squares (ALS) algorithm factorizes a given matrix R into two factors U and V such that R≈UTV. The unknown row dimension is given as a parameter to the algorithm and is called latent factors.


1 Answers

You can specify the alpha confidence level. Default is 1.0: but try lower.

val alpha = 0.01
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

Let us know how that goes.

like image 85
WestCoastProjects Avatar answered Sep 22 '22 15:09

WestCoastProjects