Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert float to int and leave nulls

I have the following dataframe, I want to convert values in column 'b' to integer

    a   b       c
0   1   NaN     3
1   5   7200.0  20
2   5   580.0   20

The following code is throwing exception "ValueError: Cannot convert NA to integer"

df['b'] = df['b'].astype(int)

How do i convert only floats to int and leave the nulls as is?

like image 734
billboard Avatar asked Sep 25 '16 19:09

billboard


3 Answers

np.NaN is a floating point only kind of thing, so it has to be removed in order to create an integer pd.Series. Jeon's suggestion work's great If 0 isn't a valid value in df['b']. For example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 5, 5], 'b': [np.NaN, 7200.0, 580.0], 'c': [3, 20, 20]})
print(df, '\n\n')

df['b'] = np.nan_to_num(df['b']).astype(int)

print(df)

if there are valid 0's, then you could first replace them all with some unique value (e.g., -999999999), the the conversion above, and then replace these unique values with 0's.

Either way, you have to remember that you have 0's where there were once NaNs. You will need to be careful to filter these out when doing various numerical analyses (e.g., mean, etc.)

like image 190
TSeymour Avatar answered Sep 21 '22 18:09

TSeymour


When your series contains floats and nan's and you want to convert to integers, you will get an error when you do try to convert your float to a numpy integer, because there are na values.

DON'T DO:

df['b'] = df['b'].astype(int)

From pandas >= 0.24 there is now a built-in pandas integer. This does allow integer nan's. Notice the capital in 'Int64'. This is the pandas integer, instead of the numpy integer.

SO, DO THIS:

df['b'] = df['b'].astype('Int64')

More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

like image 24
Sander van den Oord Avatar answered Sep 19 '22 18:09

Sander van den Oord


Similar answer as TSeymour, but now using Panda's fillna:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 5, 5], 'b': [np.NaN, 7200.0, 580.0], 'c': [3, 20, 20]})
print(df, '\n\n')

df['b'] = df['b'].fillna(0).astype(int)
print(df)

Which gives:

   a       b   c
0  1     NaN   3
1  5  7200.0  20
2  5   580.0  20 


   a     b   c
0  1     0   3
1  5  7200  20
2  5   580  20
like image 41
Arjaan Buijk Avatar answered Sep 19 '22 18:09

Arjaan Buijk