Find half of each group with Pandas GroupBy

Tags:

I need to select half of a dataframe using the groupby, where the size of each group is unknown and may vary across groups. For example:

       index  summary  participant_id
0     130599     17.0              13
1     130601     18.0              13
2     130603     16.0              13
3     130605     15.0              13
4     130607     15.0              13
5     130609     16.0              13
6     130611     17.0              13
7     130613     15.0              13
8     130615     17.0              13
9     130617     17.0              13
10     86789     12.0              14
11     86791      8.0              14
12     86793     21.0              14
13     86795     19.0              14
14     86797     20.0              14
15     86799      9.0              14
16     86801     10.0              14
20    107370      1.0              15
21    107372      2.0              15
22    107374      2.0              15
23    107376      4.0              15
24    107378      4.0              15
25    107380      7.0              15
26    107382      6.0              15
27    107597      NaN              15
28    107384     14.0              15

The size of groups from groupyby('participant_id') are 10, 7, 9 for participant_id 13, 14, 15 respectively. What I need is to take only the FIRST half (or floor(N/2)) of each group.

From my (very limited) experience with Pandas groupby, it should be something like:

df.groupby('participant_id')[['summary','participant_id']].apply(lambda x: x[:k_i])

where k_i is the half of the size of each group. Is there a simple solution to find the k_i?

247

asked Jun 27 '17 19:06

2 Answers

IIUC, you can use index slicing with size //2 inside of lambda:

df.groupby('participant_id').apply(lambda x: x.iloc[:x.participant_id.size//2])

Output:

                    index  summary  participant_id
participant_id                                    
13             0   130599     17.0              13
               1   130601     18.0              13
               2   130603     16.0              13
               3   130605     15.0              13
               4   130607     15.0              13
14             10   86789     12.0              14
               11   86791      8.0              14
               12   86793     21.0              14
15             20  107370      1.0              15
               21  107372      2.0              15
               22  107374      2.0              15
               23  107376      4.0              15

answered Sep 28 '22 04:09

You could group by participant_id and check whether its index is in the first half with the transform method. This will create a boolean Series. Then use this boolean series to filter out your original dataframe.

criteria = df.groupby('participant_id')['participant_id']\
             .transform(lambda x:  np.arange(len(x)) < int(len(x) / 2))
df[criteria]

     index  summary  participant_id
0   130599     17.0              13
1   130601     18.0              13
2   130603     16.0              13
3   130605     15.0              13
4   130607     15.0              13
10   86789     12.0              14
11   86791      8.0              14
12   86793     21.0              14
20  107370      1.0              15
21  107372      2.0              15
22  107374      2.0              15
23  107376      4.0              15

answered Sep 27 '22 04:09

Ted Petrou

Related questions
                            
                                Python local variable compile principle
                            
                                How to redirect 404 requests to homepage in Django single page app using Nginx?
                            
                                probability density histogram with Matplotlib doesnt make sense
                            
                                Calculate DATEDIFF in POSTGRES using SQLAlchemy
                            
                                How to append a NumPy array to a NumPy array
                            
                                Converting list to dict python
                            
                                How to avoid auto escaping HTML tags with Jinja2
                            
                                How can I pass keyword arguments as parameters to a function?
                            
                                How setup.py install npm module?
                            
                                Including missing combinations of values in a pandas groupby aggregation
                            
                                Replace missing values in all columns except one in pandas dataframe
                            
                                Multiple select in wagtail admin
                            
                                Python subprocess argument with equal sign and space
                            
                                Why is partition key column missing from DataFrame
                            
                                Inspect and Parse KML with pyKML
                            
                                Python Flask date update real-time
                            
                                Add fields dynamically to WTForms form
                            
                                Python, Pandas & Chi-Squared Test of Independence
                            
                                python vs java for kafka implementation
                            
                                Python number base class OR how to determine a value is a number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find half of each group with Pandas GroupBy

Tags:

python

pandas

pandas-groupby

split-apply-combine

Arnold Klein

People also ask

2 Answers

Scott Boston

Ted Petrou

Recent Activity

Donate For Us