Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.DataFrame.replace change dtype of columns

Tags:

python

pandas

So I was trying to replace np.nan values in my dataframe with None and noticed that in the process the datatype of the float columns in the dataframe changed to object even when they don't contain any missing data.

As an example:

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
data.replace(to_replace={np.nan:None}, inplace=True)

Call to data.dtypes before and after the call to replace shows that the datatype of column B changed from float to object whereas that of C stayed at int. If I remove column A from the original data that does not happen. I was wondering why that changes and how I can avoid this effect.

like image 680
Chris Avatar asked Nov 15 '22 19:11

Chris


1 Answers

I've come across this many times, and there is a fix. precede your usage of your replace with astype(object) and it will preserve the dtypes. I've had to use this for merge issues, combine issues, etc. I'm not sure why it preserves the types when used this way, but it does and it's useful once you find out about it.

data.info()    

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.replace(to_replace={np.nan:None}, inplace=True)                                                                                                                                 

data.info()   

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null object
#B    1 non-null object
#C    1 non-null int64
#dtypes: int64(1), object(2)
#memory usage: 32.0+ bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.astype(object).replace(to_replace={np.nan:None}, inplace=True)                                                                                                                  

data.info()                                                                                                                                                                          

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes
like image 177
oppressionslayer Avatar answered Jan 11 '23 22:01

oppressionslayer