Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to score all user-product combinations in Spark MatrixFactorizationModel?

Given a MatrixFactorizationModel what would be the most efficient way to return the full matrix of user-product predictions (in practice, filtered by some threshold to maintain sparsity)?

Via the current API, once could pass a cartesian product of user-product to the predict function, but it seems to me that this will do a lot of extra processing.

Would accessing the private userFeatures, productFeatures be the correct approach, and if so, is there a good way to take advantage of other aspects of the framework to distribute this computation in an efficient way? Specifically, is there an easy way to do better than multiplying all pairs of userFeature, productFeature "by hand"?

like image 799
cohoz Avatar asked Oct 12 '14 15:10

cohoz


1 Answers

Spark 1.1 has a recommendProducts method that can be mapped to each user ID. This is better than nothing but not really optimized for recommending to all users.

I would double-check that you really mean to make recommendations for everyone; at scale, this is inherently a big slow operation. Consider predicting for users that have been recently active only.

Otherwise, yes your best bet is to create your own method. The cartesian join of the feature RDDs is probably too slow as it's shuffling so many copies of the feature vectors. Choose the larger of the user / product feature set, and map that. In each worker, hold the other product / user feature set in memory in each worker. If this isn't feasible you can make this more complex and map several times against subsets of the smaller RDD in memory.

like image 146
Sean Owen Avatar answered Sep 30 '22 07:09

Sean Owen