Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert string with NaNs to int in pandas

Tags:

python

pandas

I have a pandas dataframe, all the values are strings. Some are 'None's, and the rest are integers but in string format, such as '123456'. How can I convert all 'None's to np.nan, and others to integers, like, 123456.

df = {'col1': ['1', 'None'], 'col2': ['None', '123']}

Convert df to:

df = {'col1': [1, NaN], 'col2': [NaN, 123]}
like image 999
Ting Wang Avatar asked Apr 09 '19 02:04

Ting Wang


People also ask

Can NaN be an integer?

No, NaN is a floating point value. Every possible value of an int is a number.


2 Answers

Use the below code:

print(df.replace('None', np.nan).astype(float))

Output:

   col1   col2
0   1.0    NaN
1   NaN  123.0

You have to use replace.

P.S. if df is a dictionary, convert it first:

df = pd.DataFrame(df)
like image 165
U12-Forward Avatar answered Sep 29 '22 01:09

U12-Forward


You can convert your columns to Nullable Integer type (new in 0.24+):

d = {'col1': ['1', 'None'], 'col2': ['None', '123']}
res = pd.DataFrame({
    k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype='Int32')
res

   col1  col2
0     1   NaN
1   NaN   123

With this solution, numeric data is converted to integers (but missing data remains as NaN):

res.to_dict()
# {'col1': [1, nan], 'col2': [nan, 123]}

On older versions, convert to object when initialising the DataFrame:

res = pd.DataFrame({
    k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype=object)
res

  col1 col2
0    1  NaN
1  NaN  123

It is different from the nullable types solution above—only the representation changes, not the actual data.

res.to_dict()
#  {'col1': [1.0, nan], 'col2': [nan, 123.0]}
like image 36
cs95 Avatar answered Sep 28 '22 23:09

cs95