Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace value in any column in pandas dataframe

Tags:

python

pandas

In the dataframe below:

 T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - 

How do I replace all the - with NaN's. I do not want to specify column names since I do not know before hand which column will have -

like image 761
user308827 Avatar asked Dec 11 '22 12:12

user308827


1 Answers

If those are strings, then your floats are probably also strings.

Assuming your dataframe is df, I'd try

pd.to_numeric(df.stack(), 'coerce').unstack()

Deeper explanation

Pandas doesn't usually represent missing floats with '-'. Therefore, that '-' must be a string. Thus, the dtype of any column with a '-' in it, must be 'object'. That makes it highly likely that whatever parsed the data, left the floats as string.

setup

from io import StringIO
import pandas as pd

txt = """T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - """

df = pd.read_csv(StringIO(txt), delim_whitespace=True)
print(df)

   T2MN   T2MX   RH2M  DFP2M RAIN
0  6.96   9.32  84.27   5.57    -
1  6.31  10.46      -   5.63    -
2     -  10.66  79.38   3.63    -
3  0.79   4.45  94.24   1.85    -
4  1.45   3.99  91.71   1.17    -

What are the dtypes?

print(df.dtypes)

T2MN      object
T2MX     float64
RH2M      object
DFP2M    float64
RAIN      object
dtype: object

What is the type of the first element?

print(type(df.iloc[0, 0]))

<class 'str'>

This means that any column with a '-' is like a column of strings that look like floats. You want to use pd.to_numeric with parameter errors='coerce' to force non-numeric items to np.nan. However, pd.to_numeric does not operate on a pd.DataFrame so we stack and unstack.

pd.to_numeric(df.stack(), 'coerce').unstack()

   T2MN   T2MX   RH2M  DFP2M  RAIN
0  6.96   9.32  84.27   5.57   NaN
1  6.31  10.46    NaN   5.63   NaN
2   NaN  10.66  79.38   3.63   NaN
3  0.79   4.45  94.24   1.85   NaN
4  1.45   3.99  91.71   1.17   NaN
like image 111
piRSquared Avatar answered Dec 13 '22 21:12

piRSquared