Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with missing values in python scikit NMF

I am trying to apply NMF on my dataset, using python scikit-learn. My dataset contains 0 values and missing values. But scikit-learn does not allow NaN value in data matrix. Some posts said that replace missing values with zeros.

my questions are:

  • If I replace missing value with zeros, how can the algorithm tell the missing values and real zero values?

  • Is there any other NMF implementations can deal with missing values?

  • Or if there are any other matrix factorization algorithms can do the missing value prediction?

like image 611
Zhaojie Tao Avatar asked Sep 07 '16 10:09

Zhaojie Tao


People also ask

How do you resolve missing values in Python?

Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

What is NMF Sklearn?

Non-Negative Matrix Factorization (NMF). Find two non-negative matrices, i.e. matrices with all non-negative elements, (W, H) whose product approximates the non-negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.

Which Sklearn function can be used for imputing missing data?

We can use SimpleImputer function from scikit-learn to replace missing values with a fill value. SimpleImputer function has a parameter called strategy that gives us four possibilities to choose the imputation method: strategy='mean' replaces missing values using the mean of the column.


1 Answers

There is a thread about this in scikit-learn github and a version seams to be available but not yet commited to the main code.

https://github.com/scikit-learn/scikit-learn/pull/8474

like image 54
Cristiana SP Avatar answered Sep 27 '22 21:09

Cristiana SP