Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to create grouped variable in python

I have a column of age values that I need to convert to age ranges of 18-29, 30-39, 40-49, 50-59, 60-69, and 70+:

For an example of some of the data in df 'file', I have:

enter image description here

and would like to get to:

enter image description here

I tried the following:

file['agerange'] = file[['age']].apply(lambda x: "18-29" if (x[0] > 16
                                       or x[0] < 30) else "other")

I would prefer not to just do a groupby since the bucket sizes aren't uniform but I'd be open to that as a solution if it works.

Thanks in advance!

like image 922
Josh Avatar asked Feb 08 '23 08:02

Josh


2 Answers

It looks like you are using the Pandas library. They include a function for doing this: http://pandas.pydata.org/pandas-docs/version/0.16.0/generated/pandas.cut.html

Here's my attempt:

import pandas as pd

ages = pd.DataFrame([81, 42, 18, 55, 23, 35], columns=['age'])

bins = [18, 30, 40, 50, 60, 70, 120]
labels = ['18-29', '30-39', '40-49', '50-59', '60-69', '70+']
ages['agerange'] = pd.cut(ages.age, bins, labels = labels,include_lowest = True)

print(ages)

   age agerange
0   81      70+
1   42    40-49
2   18    18-29
3   55    50-59
4   23    18-29
5   35    30-39
like image 87
aego Avatar answered Feb 24 '23 13:02

aego


Wouldn't a nested loop be the simplest solution here?

import random
ages = [random.randint(18, 100) for _ in range(100)]
age_ranges = [(18,29), (30,39), (40,49), (50,59), (60,69),(70,)]

for a in ages:
        for r in age_ranges:
                if a >= r[0] and (len(r) == 1 or a < r[1]):
                        print a,r
                        break
like image 40
igon Avatar answered Feb 24 '23 12:02

igon