Actually I'm trying to use ALS
from spark-ml
with implicit ratings.
I noticed that some predictions given by my trained model are negative
or NaN
, why is it?
Apache Spark provides an option to force non negative constraints on ALS.
Thus, to remove these negative values, you'll just need to set :
Python:
nonnegative=True
Scala:
setNonnegative(true)
when creating your ALS
model, i.e :
>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)
Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].
If you want to read more about NMF , I'd recommend reading the following paper :
As for NaN
values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :
The latest will allow you set the cold start strategy to use when creating your model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With