Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does spark-ml ALS model returns NaN and negative numbers predictions?

Actually I'm trying to use ALS from spark-ml with implicit ratings.

I noticed that some predictions given by my trained model are negative or NaN, why is it?

negative value

like image 357
Alberto Bonsanto Avatar asked Jul 04 '17 17:07

Alberto Bonsanto


1 Answers

Apache Spark provides an option to force non negative constraints on ALS.

Thus, to remove these negative values, you'll just need to set :

Python:

nonnegative=True

Scala:

setNonnegative(true)

when creating your ALS model, i.e :

>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)

Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].

If you want to read more about NMF , I'd recommend reading the following paper :

As for NaN values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :

  • https://issues.apache.org/jira/browse/SPARK-14489.
  • https://issues.apache.org/jira/browse/SPARK-19345.

The latest will allow you set the cold start strategy to use when creating your model.

like image 113
eliasah Avatar answered Nov 08 '22 09:11

eliasah