In Mahout, there is support for item based recommendation using API method:
ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer)
But in Spark Mllib, it appears that the APIs within ALS can fetch recommended products but userid must be provided via:
MatrixFactorizationModel.recommendProducts(int user, int num)
Is there a way to get recommended products based on a similar product without having to provide user id information, similar to how mahout performs item based recommendation.
ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.
ALS is implemented in Apache Spark ML and built for a larges-scale collaborative filtering problems. ALS is doing a pretty good job at solving scalability and sparseness of the Ratings data, and it's simple and scales well to very large datasets.
Spark 1.2x versions do not provide with a "item-similarity based recommender" like the ones present in Mahout.
However, MLlib currently supports model-based collaborative filtering, where users and products are described by a small set of latent factors {Understand the use case for implicit (views, clicks) and explicit feedback (ratings) while constructing a user-item matrix.}
MLlib uses the alternating least squares (ALS) algorithm [can be considered similar to the SVD algorithm] to learn these latent factors.
If you need to construct purely an item-similarity based recommender, I would recommend this:
Since similarity matrices do not scale well, (imagine how your similarity matrix would grow if you had 100 items vs 10000 items) this read on DIMSUM might be helpful if you're planning to implement it on a large number of items:
https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html
Please see my implementation of item-item recommendation model using Apache Spark here. You can implement this by using the productFeatures matrix that is generated when you run the MLib ALS algorithm on user-product-ratings data. The ALS algorithm essentially factorizes two matrix - one is userFeatures and the other is productFeatures matrix. You can run a cosine similarity on the productFeatures rank matrix to find item-item similarity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With