I have a dataframe say <code>df</code>. <code>df</code> has a column <code>'Ages'</code> <code>>>> df['Age']</code> <img src="https://i.stack.imgur.com/pcs2l.png" alt="Age Data"> I want to group this ages and create a new column something like this <pre class="prettyprint"><code>If age >= 0 & age < 2 then AgeGroup = Infant If age >= 2 & age < 4 then AgeGroup = Toddler If age >= 4 & age < 13 then AgeGroup = Kid If age >= 13 & age < 20 then AgeGroup = Teen and so on ..... </code></pre> How can I achieve this using Pandas library. I tried doing this something like this <pre class="prettyprint"><code>X_train_data['AgeGroup'][ X_train_data.Age < 13 ] = 'Kid' X_train_data['AgeGroup'][ X_train_data.Age < 3 ] = 'Toddler' X_train_data['AgeGroup'][ X_train_data.Age < 1 ] = 'Infant' </code></pre> but doing this i get this warning <blockquote> /Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy This is separate from the ipykernel package so we can avoid doing imports until /Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame </blockquote> How to avoid this warning and do it in a better way.

Use <code>pandas.cut</code> with parameter <code>right=False</code> for not includes the rightmost edge of bins: <pre class="prettyprint"><code>X_train_data = pd.DataFrame({'Age':[0,2,4,13,35,-1,54]}) bins= [0,2,4,13,20,110] labels = ['Infant','Toddler','Kid','Teen','Adult'] X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False) print (X_train_data) Age AgeGroup 0 0 Infant 1 2 Toddler 2 4 Kid 3 13 Teen 4 35 Adult 5 -1 NaN 6 54 Adult </code></pre> Last for replace missing value use <code>add_categories</code> with <code>fillna</code>: <pre class="prettyprint"><code>X_train_data['AgeGroup'] = X_train_data['AgeGroup'].cat.add_categories('unknown') .fillna('unknown') print (X_train_data) Age AgeGroup 0 0 Infant 1 2 Toddler 2 4 Kid 3 13 Teen 4 35 Adult 5 -1 unknown 6 54 Adult </code></pre> <hr> <pre class="prettyprint"><code>bins= [-1,0,2,4,13,20, 110] labels = ['unknown','Infant','Toddler','Kid','Teen', 'Adult'] X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False) print (X_train_data) Age AgeGroup 0 0 Infant 1 2 Toddler 2 4 Kid 3 13 Teen 4 35 Adult 5 -1 unknown 6 54 Adult </code></pre>

Just use: <pre class="prettyprint"><code>X_train_data.loc[(X_train_data.Age < 13), 'AgeGroup'] = 'Kid' </code></pre>

Grouping / Categorising ages column in Python Pandas

Age Data

I want to group this ages and create a new column something like this

If age >= 0 & age < 2 then AgeGroup = Infant
If age >= 2 & age < 4 then AgeGroup = Toddler
If age >= 4 & age < 13 then AgeGroup = Kid
If age >= 13 & age < 20 then AgeGroup = Teen
and so on .....

How can I achieve this using Pandas library.

I tried doing this something like this

X_train_data['AgeGroup'][ X_train_data.Age < 13 ] = 'Kid'
X_train_data['AgeGroup'][ X_train_data.Age < 3 ] = 'Toddler'
X_train_data['AgeGroup'][ X_train_data.Age < 1 ] = 'Infant'

but doing this i get this warning

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy This is separate from the ipykernel package so we can avoid doing imports until /Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

How to avoid this warning and do it in a better way.

274

asked Oct 11 '18 06:10

Anand Siddharth

2 Answers

Use pandas.cut with parameter right=False for not includes the rightmost edge of bins:

X_train_data = pd.DataFrame({'Age':[0,2,4,13,35,-1,54]})

bins= [0,2,4,13,20,110]
labels = ['Infant','Toddler','Kid','Teen','Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1      NaN
6   54    Adult

Last for replace missing value use add_categories with fillna:

X_train_data['AgeGroup'] = X_train_data['AgeGroup'].cat.add_categories('unknown')
                                                   .fillna('unknown')
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult

bins= [-1,0,2,4,13,20, 110]
labels = ['unknown','Infant','Toddler','Kid','Teen', 'Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)

print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult

157

answered Sep 21 '22 21:09

jezrael

Just use:

X_train_data.loc[(X_train_data.Age < 13),  'AgeGroup'] = 'Kid'

answered Sep 22 '22 21:09

quest

Related questions
                            
                                SQLAlchemy ORM Event hook for attribute persisted
                            
                                GAE Python : dev_appserver.py: error: too few arguments
                            
                                How add group for custom user in django?
                            
                                Celeryd multi with supervisord
                            
                                Delete files that are older than 7 days
                            
                                Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)
                            
                                Pytest setup/teardown hooks for session
                            
                                Keras model.to_json() error: 'rawunicodeescape' codec can't decode bytes in position 94-98: truncated \uXXXX
                            
                                Saving python argparse file
                            
                                MYSQL- python pip install error
                            
                                sort_values() method in pandas
                            
                                Read Value from Config File Python
                            
                                How to restore after accidentally apt-get remove python
                            
                                (Re)Checking Dependencies with PIP
                            
                                AttributeError: module 'matplotlib.pyplot' has no attribute 'xlable'
                            
                                Python open() requires full path
                            
                                Django - Forms - What does (?P<pk>\d+)/$ signify?
                            
                                Replace strings in a list (using re.sub)
                            
                                Discord.py | add role to someone
                            
                                Combine two lists without duplicate values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grouping / Categorising ages column in Python Pandas

Tags:

python

pandas

dataframe

Anand Siddharth

People also ask

2 Answers

jezrael

quest

Recent Activity

Donate For Us