Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark 2.0 ALS Recommendation how to recommend to a user

I have followed the guide given in the link http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html

But this is outdated as it uses spark Mlib RDD approach. The New Spark 2.0 has DataFrame approach. Now My problem is I have got the updated code

val ratings = spark.read.textFile("data/mllib/als/sample_movielens_ratings.txt")
  .map(parseRating)
  .toDF()
val Array(training, test) = ratings.randomSplit(Array(0.8, 0.2))

// Build the recommendation model using ALS on the training data
val als = new ALS()
  .setMaxIter(5)
  .setRegParam(0.01)
  .setUserCol("userId")
  .setItemCol("movieId")
  .setRatingCol("rating")
val model = als.fit(training)
// Evaluate the model by computing the RMSE on the test data
val predictions = model.transform(test)

Now Here is the problem, In the old code the model that was obtained was a MatrixFactorizationModel, Now it has its own model(ALSModel)

In MatrixFactorizationModel you could directly do

val recommendations = bestModel.get
  .predict(userID)

Which will give the list of products with highest probability of user liking them.

But Now there is no .predict method. Any Idea how to recommend a list of products given a user Id

like image 982
KaustubhKhati Avatar asked Dec 20 '16 14:12

KaustubhKhati


People also ask

What is ALS recommender system?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.

What do you think are the parameters of the ALS model?

rank is the number of latent factors in the model. iterations is the number of iterations to run. lambda specifies the regularization parameter in ALS. implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data.

What is ALS in spark?

spark.ml uses the alternating least squares (ALS) algorithm to learn these latent factors. The implementation in spark.ml has the following parameters: numBlocks is the number of blocks the users and items will be partitioned into in order to parallelize computation (defaults to 10).

What is rank in ALS?

rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.


Video Answer


2 Answers

Use transform method on model:

import spark.implicits._
val dataFrameToPredict = sparkContext.parallelize(Seq((111, 222)))
    .toDF("userId", "productId")
val predictionsOfProducts = model.transform (dataFrameToPredict)

There's a jira ticket to implement recommend(User|Product) method, but it's not yet on default branch

Now you have DataFrame with score for user

You can simply use orderBy and limit to show N recommended products:

// where is for case when we have big DataFrame with many users
model.transform (dataFrameToPredict.where('userId === givenUserId))
    .select ('productId, 'prediction)
    .orderBy('prediction.desc)
    .limit(N)
    .map { case Row (productId: Int, prediction: Double) => (productId, prediction) }
    .collect()

DataFrame dataFrameToPredict can be some large user-product DataFrame, for example all users x all products

like image 160
T. Gawęda Avatar answered Oct 21 '22 02:10

T. Gawęda


The ALS Model in Spark contains the following helpful methods:

  • recommendForAllItems(int numUsers)

    Returns top numUsers users recommended for each item, for all items.

  • recommendForAllUsers(int numItems)

    Returns top numItems items recommended for each user, for all users.

  • recommendForItemSubset(Dataset<?> dataset, int numUsers)

    Returns top numUsers users recommended for each item id in the input data set.

  • recommendForUserSubset(Dataset<?> dataset, int numItems)

    Returns top numItems items recommended for each user id in the input data set.


e.g. Python

from pyspark.ml.recommendation import ALS
from pyspark.sql.functions import explode

alsEstimator = ALS()

(alsEstimator.setRank(1)
  .setUserCol("user_id")
  .setItemCol("product_id")
  .setRatingCol("rating")
  .setMaxIter(20)
  .setColdStartStrategy("drop"))

alsModel = alsEstimator.fit(productRatings)

recommendForSubsetDF = alsModel.recommendForUserSubset(TargetUsers, 40)

recommendationsDF = (recommendForSubsetDF
  .select("user_id", explode("recommendations")
  .alias("recommendation"))
  .select("user_id", "recommendation.*")
)

display(recommendationsDF)

e.g. Scala:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.sql.functions.explode 

val alsEstimator = new ALS().setRank(1)
  .setUserCol("user_id")
  .setItemCol("product_id")
  .setRatingCol("rating")
  .setMaxIter(20)
  .setColdStartStrategy("drop")

val alsModel = alsEstimator.fit(productRatings)

val recommendForSubsetDF = alsModel.recommendForUserSubset(sampleTargetUsers, 40)

val recommendationsDF = recommendForSubsetDF
  .select($"user_id", explode($"recommendations").alias("recommendation"))
  .select($"user_id", $"recommendation.*")

display(recommendationsDF)
like image 5
Joshua Cook Avatar answered Oct 21 '22 01:10

Joshua Cook