I have a Pandas DataFrame of the ages of drug users. My problem: some of the ages are seperated by a hyphen, for example '50-64'. I want to grab the mean of the hyphen seperated numbers and replace the cell with it.
1.Is there a way to do it with some sort of loop or method? I don't want to simply hardcode drugs.loc[10,'age'] = np.mean(55+64)
2.For future reference, is there a more elegant way of handling data with hyphen seperated numbers?
input:
drugs.age
output:
0 12
1 13
2 14
3 15
4 16
5 17
6 18
7 19
8 20
9 21
10 22-23
11 24-25
12 26-29
13 30-34
14 35-49
15 50-64
16 65+
input:
drugs.age.dtype
output:
dtype('O')
You can use:
replace + to empty stringsplit values to DataFramefloat all values by astypeDataFrame.meandrugs['age'] = drugs['age'].str.replace('+','')
.str.split('-', expand=True)
.astype(float)
.mean(axis=1)
print (drugs)
age
0 12.0
1 13.0
2 14.0
3 15.0
4 16.0
5 17.0
6 18.0
7 19.0
8 20.0
9 21.0
10 22.5
11 24.5
12 27.5
13 32.0
14 42.0
15 57.0
16 65.0
If some values are numeric and some strings is necessary first convert all to strings:
drugs['age'] = drugs['age'].astype(str)
.str.replace('+','')
.str.split('-', expand=True)
.astype(float)
.mean(axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With