Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: ValueError: cannot convert float NaN to integer

Tags:

python

pandas

csv

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int) 
  • The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
  • When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
  • When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
  • I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
  • The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
  • Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN df = df[~df['x'].isnull()]  # Y contained some other garbage, so null check was not enough df = df[df['y'].str.isnumeric()]  # final conversion now worked df[['x']] = df[['x']].astype(int) df[['y']] = df[['y']].astype(int) 
like image 542
JaakL Avatar asked Nov 16 '17 15:11

JaakL


People also ask

How do I fix NaN error in Python?

We can replace NaN values with 0 to get rid of NaN values. This is done by using fillna() function. This function will check the NaN values in the dataframe columns and fill the given value.

How do I remove NaN from pandas?

Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.

How do you convert float to int in Python?

Python also has a built-in function to convert floats to integers: int() . In this case, 390.8 will be converted to 390 . When converting floats to integers with the int() function, Python cuts off the decimal and remaining numbers of a float to create an integer.

Is float NaN Python?

In Python, the float type has nan .


1 Answers

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()]) 

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce') 

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x']) 

Last convert values to ints:

df['x'] = df['x'].astype(int) 
like image 149
jezrael Avatar answered Sep 20 '22 02:09

jezrael