Spark 2.0 ALS Recommendation how to recommend to a user

Tags:

I have followed the guide given in the link http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html

But this is outdated as it uses spark Mlib RDD approach. The New Spark 2.0 has DataFrame approach. Now My problem is I have got the updated code

val ratings = spark.read.textFile("data/mllib/als/sample_movielens_ratings.txt")
  .map(parseRating)
  .toDF()
val Array(training, test) = ratings.randomSplit(Array(0.8, 0.2))

// Build the recommendation model using ALS on the training data
val als = new ALS()
  .setMaxIter(5)
  .setRegParam(0.01)
  .setUserCol("userId")
  .setItemCol("movieId")
  .setRatingCol("rating")
val model = als.fit(training)
// Evaluate the model by computing the RMSE on the test data
val predictions = model.transform(test)

Now Here is the problem, In the old code the model that was obtained was a MatrixFactorizationModel, Now it has its own model(ALSModel)

In MatrixFactorizationModel you could directly do

val recommendations = bestModel.get
  .predict(userID)

Which will give the list of products with highest probability of user liking them.

But Now there is no .predict method. Any Idea how to recommend a list of products given a user Id

982

asked Dec 20 '16 14:12

KaustubhKhati

Video Answer

2 Answers

Use transform method on model:

import spark.implicits._
val dataFrameToPredict = sparkContext.parallelize(Seq((111, 222)))
    .toDF("userId", "productId")
val predictionsOfProducts = model.transform (dataFrameToPredict)

There's a jira ticket to implement recommend(User|Product) method, but it's not yet on default branch

Now you have DataFrame with score for user

You can simply use orderBy and limit to show N recommended products:

// where is for case when we have big DataFrame with many users
model.transform (dataFrameToPredict.where('userId === givenUserId))
    .select ('productId, 'prediction)
    .orderBy('prediction.desc)
    .limit(N)
    .map { case Row (productId: Int, prediction: Double) => (productId, prediction) }
    .collect()

DataFrame dataFrameToPredict can be some large user-product DataFrame, for example all users x all products

160

answered Oct 21 '22 02:10

T. Gawęda

The ALS Model in Spark contains the following helpful methods:

recommendForAllItems(int numUsers)

Returns top numUsers users recommended for each item, for all items.
recommendForAllUsers(int numItems)

Returns top numItems items recommended for each user, for all users.
recommendForItemSubset(Dataset<?> dataset, int numUsers)

Returns top numUsers users recommended for each item id in the input data set.
recommendForUserSubset(Dataset<?> dataset, int numItems)

Returns top numItems items recommended for each user id in the input data set.

e.g. Python

from pyspark.ml.recommendation import ALS
from pyspark.sql.functions import explode

alsEstimator = ALS()

(alsEstimator.setRank(1)
  .setUserCol("user_id")
  .setItemCol("product_id")
  .setRatingCol("rating")
  .setMaxIter(20)
  .setColdStartStrategy("drop"))

alsModel = alsEstimator.fit(productRatings)

recommendForSubsetDF = alsModel.recommendForUserSubset(TargetUsers, 40)

recommendationsDF = (recommendForSubsetDF
  .select("user_id", explode("recommendations")
  .alias("recommendation"))
  .select("user_id", "recommendation.*")
)

display(recommendationsDF)

e.g. Scala:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.sql.functions.explode 

val alsEstimator = new ALS().setRank(1)
  .setUserCol("user_id")
  .setItemCol("product_id")
  .setRatingCol("rating")
  .setMaxIter(20)
  .setColdStartStrategy("drop")

val alsModel = alsEstimator.fit(productRatings)

val recommendForSubsetDF = alsModel.recommendForUserSubset(sampleTargetUsers, 40)

val recommendationsDF = recommendForSubsetDF
  .select($"user_id", explode($"recommendations").alias("recommendation"))
  .select($"user_id", $"recommendation.*")

display(recommendationsDF)

answered Oct 21 '22 01:10

Joshua Cook

Related questions
                            
                                Spark DataFrame filtering: retain element belonging to a list
                            
                                Modifying Map via Monocle
                            
                                Checkpointing In ALS Spark Scala
                            
                                Spark remove duplicate rows from DataFrame [duplicate]
                            
                                Play framework: read Json containing null values
                            
                                Why do we need traits in scala?
                            
                                Future that cannot fail in Scala
                            
                                RandomForestClassifier was given input with invalid label column error in Apache Spark
                            
                                Merge several json arrays in circe
                            
                                Scala`s <> operator meaning
                            
                                How to implement LEAD and LAG in Spark-scala
                            
                                Are dead threads replaced in an ExecutionContext and/or Java thread pool?
                            
                                How to change default Serializer for an Akka application?
                            
                                Convert Java array to Scala collection
                            
                                How to access elemens in Row RDD in SCALA
                            
                                Difference between Future callback methods and Promises [Success and Failure]?
                            
                                How can I list all useable implicit conversion?
                            
                                How does MapReduce recover from errors if failure happens in an intermediate stage
                            
                                Scala IDE for data science applications (like RStudio / Spyder / Rodeo)
                            
                                Akka-Http load css&js resources

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark 2.0 ALS Recommendation how to recommend to a user

Tags:

machine-learning

scala

apache-spark

apache-spark-2.0