Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper save/load of MatrixFactorizationModel

I have MatrixFactorizationModel object. If I'm trying to recommend products to single user right after constructing model through ALS.train(...) then it takes 300ms (for my data and hardware). But if I save model to disk and load it back then recommendation takes almost 2000ms. Also Spark warns:

15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor is not cached. Prediction could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor is not cached. Prediction could be slow.

How can I create/set partitioner and cache user and product factors after loading model? Following approach didn't help:

model.userFeatures().cache();
model.productFeatures().cache();

Also I was trying to repartition those rdds and create new model from repartitioned versions but that also didn't help.

like image 996
Osmin Avatar asked Jul 17 '15 15:07

Osmin


1 Answers

You don't have to use parenthesis, userFeatures is the RDD of (Int, Array[Double]) which does not take parameters.

This will help you:

model.userFeatures.cache
model.productFeatures.cache
like image 71
anshul_cached Avatar answered Dec 01 '22 18:12

anshul_cached