Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas cut results in Nan values

I have the following column with many missing values '?' in store_data dataframe

>>>store_data['trestbps']
0      140
1      130
2      132
3      142
4      110
5      120
6      150
7      180
8      120
9      160
10     126
11     140
12     110
13       ?

I replaced all missing values with -999

store_data.replace('?', -999, inplace = True)

>>>store_data['trestbps']
0       140
1       130
2       132
3       142
4       110
5       120
6       150
7       180
8       120
9       160
10      126
11      140
12      110
13     -999

Now I want to bin the values, I used this code but the output appears all as Nan:

trestbps = store_data['trestbps']
trestbps_bins = [-999,120,140,200]
store_data['trestbps'] = pd.cut(trestbps,trestbps_bins)
>>>store_data['trestbps']
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      NaN
9      NaN
10     NaN
11     NaN
12     NaN
13     NaN

The categories work fine when there is no missing values. I want my output to be categorized from (0-12) and only 13 is replaced by -999. How can I achieve this?

like image 812
user91 Avatar asked Nov 01 '25 16:11

user91


1 Answers

IIUC, you may do:

bins=[0,120,140,200] #set bins
df.trestbps=pd.cut(df.trestbps,bins) #do the cut
df.trestbps=df.trestbps.values.add_categories(999) #add category as 999
df.trestbps.fillna(999) #fillna with 999

0     (120, 140]
1     (120, 140]
2     (120, 140]
3     (140, 200]
4       (0, 120]
5       (0, 120]
6     (140, 200]
7     (140, 200]
8       (0, 120]
9     (140, 200]
10    (120, 140]
11    (120, 140]
12      (0, 120]
13           999
like image 138
anky Avatar answered Nov 04 '25 21:11

anky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!