sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

People also ask

How do you fix input contains NaN infinity or a value too large for Dtype (' float64 ')?

To solve this error, you can check your data set for NaN values using numpy. isnan() and infinite values using numpy. isfinite() . You can replace NaN values using nan_to_num() if your data is in a numpy array or SciKit-Learn's SimpleImputer.

How do you check if a value is NaN in Python?

The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.

What is NaN value?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.

This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check wheter any of the element is NaN, and not whether the return value of the any function is a number...

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:

df = df.reset_index()

I encountered this issue many times when I removed some entries in my df, such as

df = df[df.label=='desired_one']

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):

import pandas as pd
import numpy as np

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

This is the check on which it fails:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Which says

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

Related questions
                            
                                Stop pip from failing on single package when installing with requirements.txt
                            
                                Extract a part of the filepath (a directory) in Python
                            
                                Concatenate a list of pandas dataframes together
                            
                                How to add multiple columns to pandas dataframe in one assignment?
                            
                                How to make a Python script run like a service or daemon in Linux
                            
                                Anaconda export Environment file
                            
                                Running a specific test case in Django when your app has a tests directory
                            
                                What do ellipsis [...] mean in a list?
                            
                                Removing all non-numeric characters from string in Python
                            
                                pytest cannot import module while python can
                            
                                Immutable vs Mutable types
                            
                                python: Change the scripts working directory to the script's own directory
                            
                                How do I override __getattr__ in Python without breaking the default behavior?
                            
                                How to add hovering annotations in matplotlib
                            
                                Efficient way to apply multiple filters to pandas DataFrame or Series
                            
                                Having options in argparse with a dash
                            
                                Remove duplicate dict in list in Python
                            
                                How to join two sets in one line without using "|"
                            
                                How can I create directories recursively? [duplicate]
                            
                                How can I use a DLL file from Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Tags:

python

python-2.7

scikit-learn

valueerror

People also ask

Recent Activity

Donate For Us