How to do an item based recommendation in spark mllib?

Tags:

In Mahout, there is support for item based recommendation using API method:

ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer)

But in Spark Mllib, it appears that the APIs within ALS can fetch recommended products but userid must be provided via:

MatrixFactorizationModel.recommendProducts(int user, int num)

Is there a way to get recommended products based on a similar product without having to provide user id information, similar to how mahout performs item based recommendation.

374

asked Dec 17 '14 18:12

user321532

2 Answers

Spark 1.2x versions do not provide with a "item-similarity based recommender" like the ones present in Mahout.

However, MLlib currently supports model-based collaborative filtering, where users and products are described by a small set of latent factors {Understand the use case for implicit (views, clicks) and explicit feedback (ratings) while constructing a user-item matrix.}

MLlib uses the alternating least squares (ALS) algorithm [can be considered similar to the SVD algorithm] to learn these latent factors.

If you need to construct purely an item-similarity based recommender, I would recommend this:

Represent all items by a feature vector
Construct an item-item similarity matrix by computing a similarity metric (such as cosine) with each items pair
Use this item similarity matrix to find similar items for users

Since similarity matrices do not scale well, (imagine how your similarity matrix would grow if you had 100 items vs 10000 items) this read on DIMSUM might be helpful if you're planning to implement it on a large number of items:

https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html

140

answered Nov 04 '22 01:11

Vedant

Please see my implementation of item-item recommendation model using Apache Spark here. You can implement this by using the productFeatures matrix that is generated when you run the MLib ALS algorithm on user-product-ratings data. The ALS algorithm essentially factorizes two matrix - one is userFeatures and the other is productFeatures matrix. You can run a cosine similarity on the productFeatures rank matrix to find item-item similarity.

answered Nov 03 '22 23:11

user2774957

Related questions
                            
                                Spark java : Creating a new Dataset with a given schema
                            
                                Spark returning Pickle error: cannot lookup attribute
                            
                                spark streaming throughput monitoring
                            
                                How to access hdfs by URI consisting of H/A namenodes in Spark which is outer hadoop cluster?
                            
                                How to join two RDDs in spark with python?
                            
                                reducer concept in Spark
                            
                                Why does a method parameter cause NotSerializableException with Mockito?
                            
                                Pausing Dataproc cluster - Google Compute engine
                            
                                pyspark : Convert DataFrame to RDD[string]
                            
                                Scala Spark : How to create a RDD from a list of string and convert to DataFrame
                            
                                Performance Impact of RDD to JavaRDD conversion
                            
                                Spark - Divide int with column?
                            
                                ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
                            
                                How to convert Avro Schema object into StructType in spark
                            
                                Spark.ml regressions do not calculate same models as scikit-learn
                            
                                What is the use of --driver-class-path in the spark command?
                            
                                Filter Spark Dataframe with a variable
                            
                                Date and Interval Addition in SparkSQL
                            
                                hadoop aws versions compatibility
                            
                                Spark Java Appilcation : java.lang.ClassNotFoundException

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do an item based recommendation in spark mllib?

Tags:

apache-spark

mahout

apache-spark-mllib

recommendation-engine

user321532

People also ask

2 Answers

Vedant

user2774957

Recent Activity

Donate For Us