I have a dataframe with a column of floats that I want to convert to int:
> df['VEHICLE_ID'].head()
0 8659366.0
1 8659368.0
2 8652175.0
3 8652174.0
4 8651488.0
In theory I should just be able to use:
> df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)
But I get:
Output: ValueError: Cannot convert NA to integer
But I am pretty sure that there are no NaNs in this series:
> df['VEHICLE_ID'].fillna(999,inplace=True)
> df[df['VEHICLE_ID'] == 999]
> Output: Empty DataFrame
Columns: [VEHICLE_ID]
Index: []
What's going on?
Basically the error is telling you that you NaN
values and I will show why your attempts didn't reveal this:
In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
a
0 1.0
1 NaN
2 3.0
3 4.0
now try to cast:
df['a'].astype(int)
this raises:
ValueError: Cannot convert NA to integer
but then you tried something like this:
In [5]:
for index, row in df['a'].iteritems():
if row == np.NaN:
print('index:', index, 'isnull')
this printed nothing, but NaN
cannot be evaluated like this using equality, in fact it has a special property that it will return False
when comparing against itself:
In [6]:
for index, row in df['a'].iteritems():
if row != row:
print('index:', index, 'isnull')
index: 1 isnull
now it prints the row, you should use isnull
for readability:
In [9]:
for index, row in df['a'].iteritems():
if pd.isnull(row):
print('index:', index, 'isnull')
index: 1 isnull
So what to do? We can drop the rows: df.dropna(subset='a')
, or we can replace using fillna
:
In [8]:
df['a'].fillna(0).astype(int)
Out[8]:
0 1
1 0
2 3
3 4
Name: a, dtype: int32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With