My dataframe:
data = {'Input':[133217,133217,133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426,132426],
'Font':[30,25,25,21,20,19,50,50,50,38,38,30,30,29]}
Input Font
0 133217 30
1 133217 25
2 133217 25
3 133217 21
4 133217 20
5 133217 19
6 132426 50
7 132426 50
8 132426 50
9 132426 38
10 132426 38
11 132426 30
12 132426 30
13 132426 29
I would like to create a new data frame containing only the values in Font that belong to 3 unique maximum values. For example, 3 Maximum Font values for Input 133217 are 30, 25, 21.
Expected output:
op_data = {'Input':[133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426],
'Font':[30,25,25,21,50,50,50,38,38,30,30]}
Input Font
0 133217 30
1 133217 25
2 133217 25
3 133217 21
4 132426 50
5 132426 50
6 132426 50
7 132426 38
8 132426 38
9 132426 30
10 132426 30
I've tried this with groupby from pandas:
df = pd.DataFrame(data)
df['order'] = df.groupby('Input').cumcount()+1
then I considered 1,2,3 values in df['order']
, which didn't work out as planned. Any alternative way?
To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.
Maximum value of a column in R can be calculated by using max() function. Max() Function takes column name as argument and calculates the maximum value of that column. Maximum of single column in R, Maximum of multiple columns in R using dplyr.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
You can find unique values for each group, get the list with three max values and select rows which are in this list:
df.groupby('Input')['Font'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0)
Output:
Input Font
6 132426 50
7 132426 50
8 132426 50
9 132426 38
10 132426 38
11 132426 30
12 132426 30
0 133217 30
1 133217 25
2 133217 25
3 133217 21
I would break the task in 2 steps.
1st one is ordering the dataframe. It seems your dataframe is already ordered.
dft = dft.sort_values(by=['Input','Font'],ascending=False)
Then, groupby using 'Input' column and head(3), to get top 3 for each distinct 'Input' group:
dft = dft.groupby('Input').head(3)
print(dft)
Input Font
0 133217 30
1 133217 25
2 133217 25
6 132426 50
7 132426 50
8 132426 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With