My dataframe: <pre class="prettyprint"><code>data = {'Input':[133217,133217,133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426,132426], 'Font':[30,25,25,21,20,19,50,50,50,38,38,30,30,29]} Input Font 0 133217 30 1 133217 25 2 133217 25 3 133217 21 4 133217 20 5 133217 19 6 132426 50 7 132426 50 8 132426 50 9 132426 38 10 132426 38 11 132426 30 12 132426 30 13 132426 29 </code></pre> I would like to create a new data frame containing only the values in Font that belong to 3 unique maximum values. For example, 3 Maximum Font values for Input 133217 are 30, 25, 21. Expected output: <pre class="prettyprint"><code>op_data = {'Input':[133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426], 'Font':[30,25,25,21,50,50,50,38,38,30,30]} Input Font 0 133217 30 1 133217 25 2 133217 25 3 133217 21 4 132426 50 5 132426 50 6 132426 50 7 132426 38 8 132426 38 9 132426 30 10 132426 30 </code></pre> I've tried this with groupby from pandas: <pre class="prettyprint"><code>df = pd.DataFrame(data) df['order'] = df.groupby('Input').cumcount()+1 </code></pre> then I considered 1,2,3 values in <code>df['order']</code>, which didn't work out as planned. Any alternative way?

You can find unique values for each group, get the list with three max values and select rows which are in this list: <pre class="prettyprint"><code>df.groupby('Input')['Font'].\ apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\ reset_index(level=0) </code></pre> Output: <pre class="prettyprint"><code> Input Font 6 132426 50 7 132426 50 8 132426 50 9 132426 38 10 132426 38 11 132426 30 12 132426 30 0 133217 30 1 133217 25 2 133217 25 3 133217 21 </code></pre>

I would break the task in 2 steps. 1st one is ordering the dataframe. It seems your dataframe is already ordered. <pre class="prettyprint"><code>dft = dft.sort_values(by=['Input','Font'],ascending=False) </code></pre> Then, groupby using 'Input' column and head(3), to get top 3 for each distinct 'Input' group: <pre class="prettyprint"><code>dft = dft.groupby('Input').head(3) print(dft) Input Font 0 133217 30 1 133217 25 2 133217 25 6 132426 50 7 132426 50 8 132426 50 </code></pre>

Group and find all values that belong to n unique maximum values

Tags:

python

pandas-groupby

My dataframe:

Click to copy

data = {'Input':[133217,133217,133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,20,19,50,50,50,38,38,30,30,29]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   133217    20
5   133217    19
6   132426    50
7   132426    50
8   132426    50
9   132426    38
10  132426    38
11  132426    30
12  132426    30
13  132426    29

I would like to create a new data frame containing only the values in Font that belong to 3 unique maximum values. For example, 3 Maximum Font values for Input 133217 are 30, 25, 21.

Expected output:

Click to copy

op_data = {'Input':[133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,50,50,50,38,38,30,30]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   132426    50
5   132426    50
6   132426    50
7   132426    38
8   132426    38
9   132426    30
10  132426    30

I've tried this with groupby from pandas:

Click to copy

df = pd.DataFrame(data)
df['order'] = df.groupby('Input').cumcount()+1

then I considered 1,2,3 values in df['order'], which didn't work out as planned. Any alternative way?

642

asked Dec 04 '19 08:12

DGS

2 Answers

You can find unique values for each group, get the list with three max values and select rows which are in this list:

Click to copy

df.groupby('Input')['Font'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0)

Output:

Click to copy

     Input  Font
6   132426    50
7   132426    50
8   132426    50
9   132426    38
10  132426    38
11  132426    30
12  132426    30
0   133217    30
1   133217    25
2   133217    25
3   133217    21

148

answered Oct 16 '22 11:10

Mykola Zotko

I would break the task in 2 steps.

1st one is ordering the dataframe. It seems your dataframe is already ordered.

Click to copy

dft = dft.sort_values(by=['Input','Font'],ascending=False)

Then, groupby using 'Input' column and head(3), to get top 3 for each distinct 'Input' group:

Click to copy

dft = dft.groupby('Input').head(3)
print(dft)

    Input  Font
0  133217    30
1  133217    25
2  133217    25
6  132426    50
7  132426    50
8  132426    50

answered Oct 16 '22 12:10

powerPixie

Related questions
                            
                                Finding the proper Python type hint, for instance, the signature of the built-in function map()
                            
                                Why am I getting "An error ocurred while starting the kernel" in Spyder while running Python?
                            
                                Python Setuptools and PBR - how to create a package release using the git tag as the version?
                            
                                Delete row/column from Excel with xlsxwriter
                            
                                Bert Embedding Layer raises `Type Error: unsupported operand type(s) for +: 'None Type' and 'int'` with BiLSTM
                            
                                How to build TensorFlow lite with select TensorFlow ops for x86_64 systems?
                            
                                How to extract data from a Tweepy object into a pandas dataframe?
                            
                                Generate a column based on a constraint in pandas
                            
                                Why does my Streamlit application open multiple times?
                            
                                How to convert nested json structure to dataframe
                            
                                Can I get() or xcom.pull() a variable in the MAIN part of an Airflow script (outside a PythonOperator)?
                            
                                Sort lines in text file between patterns
                            
                                Where is the class list_iterator defined?
                            
                                mount error when trying to access the Azure DBFS file system in Azure Databricks
                            
                                How to load numpy array in a tensorflow dataset
                            
                                pytorch debugging timeout with PyCharm
                            
                                Fixing 'Import [module] could not be resolved' in pyright
                            
                                Python: How to automate 'Allow' flash player content in Firefox?
                            
                                Python does not allow annotating the types of variables when unpacking
                            
                                How to measure xgboost regressor accuracy using accuracy_score (or other suggested function)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group and find all values that belong to n unique maximum values

Tags:

python

pandas-groupby

DGS

People also ask

2 Answers

Mykola Zotko

powerPixie

Recent Activity

Donate For Us