Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to capture mean of hyphen seperated numbers in a pandas dataframe?

Tags:

python

pandas

I have a Pandas DataFrame of the ages of drug users. My problem: some of the ages are seperated by a hyphen, for example '50-64'. I want to grab the mean of the hyphen seperated numbers and replace the cell with it.

1.Is there a way to do it with some sort of loop or method? I don't want to simply hardcode drugs.loc[10,'age'] = np.mean(55+64)

2.For future reference, is there a more elegant way of handling data with hyphen seperated numbers?

input:
drugs.age
output:
0        12
1        13
2        14
3        15
4        16
5        17
6        18
7        19
8        20
9        21
10    22-23
11    24-25
12    26-29
13    30-34
14    35-49
15    50-64
16      65+

input:
drugs.age.dtype
output:
dtype('O')
like image 842
Gen Tan Avatar asked Dec 08 '25 18:12

Gen Tan


1 Answers

You can use:

  • replace + to empty string
  • split values to DataFrame
  • cast to float all values by astype
  • get mean per rows by DataFrame.mean

drugs['age'] = drugs['age'].str.replace('+','')
                           .str.split('-', expand=True)
                           .astype(float)
                           .mean(axis=1)
print (drugs)
     age
0   12.0
1   13.0
2   14.0
3   15.0
4   16.0
5   17.0
6   18.0
7   19.0
8   20.0
9   21.0
10  22.5
11  24.5
12  27.5
13  32.0
14  42.0
15  57.0
16  65.0

If some values are numeric and some strings is necessary first convert all to strings:

drugs['age'] = drugs['age'].astype(str)
                           .str.replace('+','')
                           .str.split('-', expand=True)
                           .astype(float)
                           .mean(axis=1)
like image 118
jezrael Avatar answered Dec 10 '25 22:12

jezrael