Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do an item based recommendation in spark mllib?

In Mahout, there is support for item based recommendation using API method:

ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer)

But in Spark Mllib, it appears that the APIs within ALS can fetch recommended products but userid must be provided via:

MatrixFactorizationModel.recommendProducts(int user, int num)

Is there a way to get recommended products based on a similar product without having to provide user id information, similar to how mahout performs item based recommendation.

like image 374
user321532 Avatar asked Dec 17 '14 18:12

user321532


People also ask

What is ALS recommender system?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.

Is ALS collaborative filtering?

ALS is implemented in Apache Spark ML and built for a larges-scale collaborative filtering problems. ALS is doing a pretty good job at solving scalability and sparseness of the Ratings data, and it's simple and scales well to very large datasets.


2 Answers

Spark 1.2x versions do not provide with a "item-similarity based recommender" like the ones present in Mahout.

However, MLlib currently supports model-based collaborative filtering, where users and products are described by a small set of latent factors {Understand the use case for implicit (views, clicks) and explicit feedback (ratings) while constructing a user-item matrix.}

MLlib uses the alternating least squares (ALS) algorithm [can be considered similar to the SVD algorithm] to learn these latent factors.

If you need to construct purely an item-similarity based recommender, I would recommend this:

  1. Represent all items by a feature vector
  2. Construct an item-item similarity matrix by computing a similarity metric (such as cosine) with each items pair
  3. Use this item similarity matrix to find similar items for users

Since similarity matrices do not scale well, (imagine how your similarity matrix would grow if you had 100 items vs 10000 items) this read on DIMSUM might be helpful if you're planning to implement it on a large number of items:

https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html

like image 140
Vedant Avatar answered Nov 04 '22 01:11

Vedant


Please see my implementation of item-item recommendation model using Apache Spark here. You can implement this by using the productFeatures matrix that is generated when you run the MLib ALS algorithm on user-product-ratings data. The ALS algorithm essentially factorizes two matrix - one is userFeatures and the other is productFeatures matrix. You can run a cosine similarity on the productFeatures rank matrix to find item-item similarity.

like image 27
user2774957 Avatar answered Nov 03 '22 23:11

user2774957