Keep X% last rows by group in Pandas

Tags:

It's straightforward to keep the last N rows for every group in a dataframe with something like df.groupby('ID').tail(N).

In my case, groups have different sizes and I would like to keep the same % of each group rather than same number of rows.

e.g if we want to keep the last 50% rows for each group (based on ID) for the following :

df = pd.DataFrame({'ID' : ['A','A','B','B','B','B','B','B'], 
'value' : [1,2,10,11,12,13,14,15]})

The result would be :

 pd.DataFrame({'ID' : ['A','A','B','B','B','B','B','B'], 
    'value' : [2,13,14,15]})

How can we get to that ?

EDIT : If x% is not an int, we round to the smallest closer int.

679

asked Mar 10 '21 03:03

mlx

2 Answers

`groupby`-`apply`-`tail`

Pass the desired size to tail() in a GroupBy.apply(). This is simpler than the iloc method below since it cleanly handles the "last 0 rows" case.

ratio = 0.6
(df.groupby('ID')
   .apply(lambda x: x.tail(int(ratio * len(x))))
   .reset_index(drop=True))

#   ID  value
# 0  A      2
# 1  B     13
# 2  B     14
# 3  B     15

ratio = 0.4
(df.groupby('ID')
   .apply(lambda x: x.tail(int(ratio * len(x))))
   .reset_index(drop=True))

#   ID  value
# 0  B     14
# 1  B     15

`groupby`-`apply`-`iloc`

Alternatively, index the desired size via iloc/slicing, but this is clunkier since [-0:] does not actually get the last 0 rows, so we have to check against that:

ratio = 0.6
(df.groupby('ID')
   .apply(lambda x: x[-int(ratio * len(x)):] if int(ratio * len(x)) else None)
   .reset_index(drop=True))

#   ID  value
# 0  A      2
# 1  B     13
# 2  B     14
# 3  B     15

ratio = 0.4
(df.groupby('ID')
   .apply(lambda x: x[-int(ratio * len(x)):] if int(ratio * len(x)) else None)
   .reset_index(drop=True))

#   ID  value
# 0  B     14
# 1  B     15

143

answered Sep 27 '22 17:09

tdy

Like commented, there is no built-in option to do so. You can do something like:

groups = df.groupby('ID')

enums = groups.cumcount().add(1)
sizes = groups['ID'].transform('size')

df[enums/sizes > 0.5]

Output:

  ID  value
1  A      2
5  B     13
6  B     14
7  B     15

answered Sep 27 '22 17:09

Quang Hoang

Related questions
                            
                                Python open() requires full path
                            
                                Django - Forms - What does (?P<pk>\d+)/$ signify?
                            
                                Replace strings in a list (using re.sub)
                            
                                Discord.py | add role to someone
                            
                                Combine two lists without duplicate values
                            
                                Grouping / Categorising ages column in Python Pandas
                            
                                IPython, "name 'plt' not defined"
                            
                                Returning the three maximal values in a dictionary
                            
                                SQLAlchemy: Get database name from engine
                            
                                Documentation for PyTorch .to('cpu') or .to('cuda')
                            
                                How to delete duplicated dictionary objects from a List of dictionaries
                            
                                Scrapy - removing html tags in a list output
                            
                                How to hard refresh using Selenium
                            
                                Why are 'and/or' operations in this Python statement behaving unexpectedly?
                            
                                Check for any values in set
                            
                                Python 3.7.4: 're.error: bad escape \s at position 0'
                            
                                Primer on TensorFlow and Keras: The past (TF1) the present (TF2)
                            
                                Compare lists in the same dictionary of lists
                            
                                Efficient algorithm to find the sum of all concatenated pairs of integers in a list
                            
                                Error when importing geopandas OSError: Could not find lib c or load any of its variants []

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keep X% last rows by group in Pandas

Tags:

python

pandas

group-by

mlx

People also ask

2 Answers

`groupby`-`apply`-`tail`

`groupby`-`apply`-`iloc`

tdy

Quang Hoang

Recent Activity

Donate For Us

Keep X% last rows by group in Pandas

Tags:

python

pandas

group-by

mlx

People also ask

2 Answers

groupby-apply-tail

groupby-apply-iloc

tdy

Quang Hoang

Related questions

Recent Activity

Donate For Us

`groupby`-`apply`-`tail`

`groupby`-`apply`-`iloc`