Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add sequential counter column on groups using Pandas groupby

Tags:

python

pandas

I feel like there is a better way than this:

import pandas as pd df = pd.DataFrame(     columns="   index    c1    c2    v1 ".split(),     data= [             [       0,  "A",  "X",    3, ],             [       1,  "A",  "X",    5, ],             [       2,  "A",  "Y",    7, ],             [       3,  "A",  "Y",    1, ],             [       4,  "B",  "X",    3, ],             [       5,  "B",  "X",    1, ],             [       6,  "B",  "X",    3, ],             [       7,  "B",  "Y",    1, ],             [       8,  "C",  "X",    7, ],             [       9,  "C",  "Y",    4, ],             [      10,  "C",  "Y",    1, ],             [      11,  "C",  "Y",    6, ],]).set_index("index", drop=True) def callback(x):     x['seq'] = range(1, x.shape[0] + 1)     return x df = df.groupby(['c1', 'c2']).apply(callback) print df 

To achieve this:

   c1 c2  v1  seq 0   A  X   3    1 1   A  X   5    2 2   A  Y   7    1 3   A  Y   1    2 4   B  X   3    1 5   B  X   1    2 6   B  X   3    3 7   B  Y   1    1 8   C  X   7    1 9   C  Y   4    1 10  C  Y   1    2 11  C  Y   6    3 

Is there a way to do it that avoids the callback?

like image 399
Owen Avatar asked May 02 '14 19:05

Owen


People also ask

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

Does pandas Groupby maintain order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

What does Cumcount do in pandas?

Number each item in each group from 0 to the length of that group - 1. If False, number in reverse, from length of group - 1 to 0. Sequence number of each element within each group.

Can you Groupby multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


2 Answers

use cumcount(), see docs here

In [4]: df.groupby(['c1', 'c2']).cumcount() Out[4]:  0     0 1     1 2     0 3     1 4     0 5     1 6     2 7     0 8     0 9     0 10    1 11    2 dtype: int64 

If you want orderings starting at 1

In [5]: df.groupby(['c1', 'c2']).cumcount()+1 Out[5]:  0     1 1     2 2     1 3     2 4     1 5     2 6     3 7     1 8     1 9     1 10    2 11    3 dtype: int64 
like image 70
Jeff Avatar answered Sep 30 '22 07:09

Jeff


This might be useful

df = df.sort_values(['userID', 'date']) grp = df.groupby('userID')['ItemID'].aggregate(lambda x: '->'.join(tuple(x))).reset_index() print(grp) 

it will create a sequence like this enter image description here

like image 32
Shaina Raza Avatar answered Sep 30 '22 08:09

Shaina Raza