Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cannot convert nan to int (but there are no nans)

Tags:

pandas

I have a dataframe with a column of floats that I want to convert to int:

> df['VEHICLE_ID'].head()
0    8659366.0
1    8659368.0
2    8652175.0
3    8652174.0
4    8651488.0

In theory I should just be able to use:

> df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)

But I get:

Output: ValueError: Cannot convert NA to integer

But I am pretty sure that there are no NaNs in this series:

> df['VEHICLE_ID'].fillna(999,inplace=True)
> df[df['VEHICLE_ID'] == 999]
> Output: Empty DataFrame
Columns: [VEHICLE_ID]
Index: []

What's going on?

like image 659
ale19 Avatar asked Feb 01 '17 16:02

ale19


1 Answers

Basically the error is telling you that you NaN values and I will show why your attempts didn't reveal this:

In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
     a
0  1.0
1  NaN
2  3.0
3  4.0

now try to cast:

df['a'].astype(int)

this raises:

ValueError: Cannot convert NA to integer

but then you tried something like this:

In [5]:
for index, row in df['a'].iteritems():
    if row == np.NaN:
        print('index:', index, 'isnull')

this printed nothing, but NaN cannot be evaluated like this using equality, in fact it has a special property that it will return False when comparing against itself:

In [6]:
for index, row in df['a'].iteritems():
    if row != row:
        print('index:', index, 'isnull')

index: 1 isnull

now it prints the row, you should use isnull for readability:

In [9]:
for index, row in df['a'].iteritems():
    if pd.isnull(row):
        print('index:', index, 'isnull')

index: 1 isnull

So what to do? We can drop the rows: df.dropna(subset='a'), or we can replace using fillna:

In [8]:
df['a'].fillna(0).astype(int)

Out[8]:
0    1
1    0
2    3
3    4
Name: a, dtype: int32
like image 90
EdChum Avatar answered Nov 17 '22 09:11

EdChum