Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group and find all values that belong to n unique maximum values

My dataframe:

data = {'Input':[133217,133217,133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,20,19,50,50,50,38,38,30,30,29]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   133217    20
5   133217    19
6   132426    50
7   132426    50
8   132426    50
9   132426    38
10  132426    38
11  132426    30
12  132426    30
13  132426    29

I would like to create a new data frame containing only the values in Font that belong to 3 unique maximum values. For example, 3 Maximum Font values for Input 133217 are 30, 25, 21.

Expected output:

op_data = {'Input':[133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,50,50,50,38,38,30,30]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   132426    50
5   132426    50
6   132426    50
7   132426    38
8   132426    38
9   132426    30
10  132426    30

I've tried this with groupby from pandas:

df = pd.DataFrame(data)
df['order'] = df.groupby('Input').cumcount()+1

then I considered 1,2,3 values in df['order'], which didn't work out as planned. Any alternative way?

like image 642
DGS Avatar asked Dec 04 '19 08:12

DGS


People also ask

How do you get the maximum values of each group in a Pandas?

To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.

How do you find the maximum value of Dplyr?

Maximum value of a column in R can be calculated by using max() function. Max() Function takes column name as argument and calculates the maximum value of that column. Maximum of single column in R, Maximum of multiple columns in R using dplyr.

Can you group by multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.


2 Answers

You can find unique values for each group, get the list with three max values and select rows which are in this list:

df.groupby('Input')['Font'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0)

Output:

     Input  Font
6   132426    50
7   132426    50
8   132426    50
9   132426    38
10  132426    38
11  132426    30
12  132426    30
0   133217    30
1   133217    25
2   133217    25
3   133217    21
like image 148
Mykola Zotko Avatar answered Oct 16 '22 11:10

Mykola Zotko


I would break the task in 2 steps.

1st one is ordering the dataframe. It seems your dataframe is already ordered.

dft = dft.sort_values(by=['Input','Font'],ascending=False)

Then, groupby using 'Input' column and head(3), to get top 3 for each distinct 'Input' group:

dft = dft.groupby('Input').head(3)
print(dft)

    Input  Font
0  133217    30
1  133217    25
2  133217    25
6  132426    50
7  132426    50
8  132426    50
like image 38
powerPixie Avatar answered Oct 16 '22 12:10

powerPixie