Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find half of each group with Pandas GroupBy

I need to select half of a dataframe using the groupby, where the size of each group is unknown and may vary across groups. For example:

       index  summary  participant_id
0     130599     17.0              13
1     130601     18.0              13
2     130603     16.0              13
3     130605     15.0              13
4     130607     15.0              13
5     130609     16.0              13
6     130611     17.0              13
7     130613     15.0              13
8     130615     17.0              13
9     130617     17.0              13
10     86789     12.0              14
11     86791      8.0              14
12     86793     21.0              14
13     86795     19.0              14
14     86797     20.0              14
15     86799      9.0              14
16     86801     10.0              14
20    107370      1.0              15
21    107372      2.0              15
22    107374      2.0              15
23    107376      4.0              15
24    107378      4.0              15
25    107380      7.0              15
26    107382      6.0              15
27    107597      NaN              15
28    107384     14.0              15

The size of groups from groupyby('participant_id') are 10, 7, 9 for participant_id 13, 14, 15 respectively. What I need is to take only the FIRST half (or floor(N/2)) of each group.

From my (very limited) experience with Pandas groupby, it should be something like:

df.groupby('participant_id')[['summary','participant_id']].apply(lambda x: x[:k_i])

where k_i is the half of the size of each group. Is there a simple solution to find the k_i?

like image 247
Arnold Klein Avatar asked Jun 27 '17 19:06

Arnold Klein


People also ask

How do you calculate percentage in Groupby pandas?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

How do you split a Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.


2 Answers

IIUC, you can use index slicing with size //2 inside of lambda:

df.groupby('participant_id').apply(lambda x: x.iloc[:x.participant_id.size//2])

Output:

                    index  summary  participant_id
participant_id                                    
13             0   130599     17.0              13
               1   130601     18.0              13
               2   130603     16.0              13
               3   130605     15.0              13
               4   130607     15.0              13
14             10   86789     12.0              14
               11   86791      8.0              14
               12   86793     21.0              14
15             20  107370      1.0              15
               21  107372      2.0              15
               22  107374      2.0              15
               23  107376      4.0              15
like image 85
Scott Boston Avatar answered Sep 28 '22 04:09

Scott Boston


You could group by participant_id and check whether its index is in the first half with the transform method. This will create a boolean Series. Then use this boolean series to filter out your original dataframe.

criteria = df.groupby('participant_id')['participant_id']\
             .transform(lambda x:  np.arange(len(x)) < int(len(x) / 2))
df[criteria]

     index  summary  participant_id
0   130599     17.0              13
1   130601     18.0              13
2   130603     16.0              13
3   130605     15.0              13
4   130607     15.0              13
10   86789     12.0              14
11   86791      8.0              14
12   86793     21.0              14
20  107370      1.0              15
21  107372      2.0              15
22  107374      2.0              15
23  107376      4.0              15
like image 34
Ted Petrou Avatar answered Sep 27 '22 04:09

Ted Petrou