I am trying to apply NMF on my dataset, using python scikit-learn. My dataset contains 0 values and missing values. But scikit-learn does not allow NaN value in data matrix. Some posts said that replace missing values with zeros.
my questions are:
If I replace missing value with zeros, how can the algorithm tell the missing values and real zero values?
Is there any other NMF implementations can deal with missing values?
Or if there are any other matrix factorization algorithms can do the missing value prediction?
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Non-Negative Matrix Factorization (NMF). Find two non-negative matrices, i.e. matrices with all non-negative elements, (W, H) whose product approximates the non-negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.
We can use SimpleImputer function from scikit-learn to replace missing values with a fill value. SimpleImputer function has a parameter called strategy that gives us four possibilities to choose the imputation method: strategy='mean' replaces missing values using the mean of the column.
There is a thread about this in scikit-learn github and a version seams to be available but not yet commited to the main code.
https://github.com/scikit-learn/scikit-learn/pull/8474
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With