For example the data look like: <pre class="prettyprint"><code>df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]} x=pd.DataFrame(df) Out[260]: a1 a2 a3 a4 a5 a6 0 5 23 4 1 4 5 1 6 43 2 2 98 43 2 3 56 3 1 23 3 3 2 2 6 3 5 2 4 5 6 7 2 7 5 </code></pre> I need the result to look like: <pre class="prettyprint"><code>top1 top2 top3 a2 a1 a6 a5 a2 a6 .... </code></pre> I've seen answer to a previous questions (see below) that recommends idxmax. But how to handle top n values (n>1)? Find the column name which has the maximum value for each row Update: I find the answer very useful but the only thing is that my data is long so have to figure out a way to bypass that. I ended up saving the data to a csv file and then reading it back in in chunks. here is the code I used: <pre class="prettyprint"><code>data = pd.read_csv('xxx.csv', chunksize=1000) rslt = pd.DataFrame(np.zeros((0,3)), columns=['top1','top2','top3']) for chunk in data: x=pd.DataFrame(chunk).T for i in x.columns: df1row = pd.DataFrame(x.nlargest(3, i).index.tolist(), index=['top1','top2','top3']).T rslt = pd.concat([rslt, df1row], axis=0) rslt=rslt.reset_index(drop=True) </code></pre>

<pre class="prettyprint"><code>import pandas as pd import numpy as np df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]} df=pd.DataFrame(df) df a1 a2 a3 a4 a5 a6 0 5 23 4 1 4 5 1 6 43 2 2 98 43 2 3 56 3 1 23 3 3 2 2 6 3 5 2 4 5 6 7 2 7 5 </code></pre> We can solve it using the <code>argsort</code>from <code>numpy</code> and <code>apply</code> , <code>lambda</code> from <code>pandas</code>. The solution: <pre class="prettyprint"><code>Tops =pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(), columns=['Top1', 'Top2', 'Top3']) Tops </code></pre> And we get: <pre class="prettyprint"><code> Top1 Top2 Top3 0 a2 a6 a1 1 a5 a6 a2 2 a2 a5 a6 3 a3 a5 a4 4 a5 a3 a2 </code></pre>

You can do it like this: <pre class="prettyprint"><code>x.T.apply(lambda x: x.sort_values(ascending=False).index).T.filter(['a1','a2','a3']).rename(columns={"a1":'top1',"a2":'top2',"a3":'top3'}) </code></pre> Results: <pre class="prettyprint"><code> top1 top2 top3 0 a2 a6 a1 1 a5 a6 a2 2 a2 a5 a6 3 a3 a5 a4 4 a5 a3 a2 </code></pre>

Find the column names which have top 3 largest values for each row

Tags:

python

For example the data look like:

df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}
x=pd.DataFrame(df)
Out[260]: 
    a1  a2  a3  a4  a5  a6
0   5  23   4   1   4   5
1   6  43   2   2   98   43
2   3  56   3   1  23   3
3   2   2   6   3   5   2
4   5   6   7   2   7   5

I need the result to look like:

top1 top2 top3
a2   a1   a6
a5   a2   a6
....

I've seen answer to a previous questions (see below) that recommends idxmax. But how to handle top n values (n>1)?

Find the column name which has the maximum value for each row

Update:

I find the answer very useful but the only thing is that my data is long so have to figure out a way to bypass that. I ended up saving the data to a csv file and then reading it back in in chunks. here is the code I used:

data = pd.read_csv('xxx.csv', chunksize=1000)
rslt = pd.DataFrame(np.zeros((0,3)), columns=['top1','top2','top3'])
for chunk in data:
    x=pd.DataFrame(chunk).T
    for i in x.columns:
        df1row = pd.DataFrame(x.nlargest(3, i).index.tolist(), index=['top1','top2','top3']).T
        rslt = pd.concat([rslt, df1row], axis=0)
rslt=rslt.reset_index(drop=True)

836

asked May 28 '16 03:05

CWeeks

3 Answers

import pandas as pd
import numpy as np

df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}
df=pd.DataFrame(df)

df


   a1  a2  a3  a4  a5  a6
0   5  23   4   1   4   5
1   6  43   2   2  98  43
2   3  56   3   1  23   3
3   2   2   6   3   5   2
4   5   6   7   2   7   5

We can solve it using the argsortfrom numpy and apply , lambda from pandas. The solution:

Tops =pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(),  columns=['Top1', 'Top2', 'Top3'])


Tops

And we get:

  Top1 Top2 Top3
0   a2   a6   a1
1   a5   a6   a2
2   a2   a5   a6
3   a3   a5   a4
4   a5   a3   a2

answered Oct 26 '22 20:10

George Pipis

What you need is pandas.DataFrame.nlargest.

import pandas as pd
import numpy as np

df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}

x=pd.DataFrame(df).T

rslt = pd.DataFrame(np.zeros((0,3)), columns=['top1','top2','top3'])
for i in x.columns:
    df1row = pd.DataFrame(x.nlargest(3, i).index.tolist(), index=['top1','top2','top3']).T
    rslt = pd.concat([rslt, df1row], axis=0)

print rslt

Out[52]: 
  top1 top2 top3
0   a2   a1   a6
0   a5   a2   a6
0   a2   a5   a1
0   a3   a5   a4
0   a3   a5   a2

answered Oct 26 '22 19:10

2342G456DI8

You can do it like this:

x.T.apply(lambda x: x.sort_values(ascending=False).index).T.filter(['a1','a2','a3']).rename(columns={"a1":'top1',"a2":'top2',"a3":'top3'})

Results:

  top1 top2 top3
0   a2  a6  a1
1   a5  a6  a2
2   a2  a5  a6
3   a3  a5  a4
4   a5  a3  a2

answered Oct 26 '22 19:10

Billy Bonaros

Related questions
                            
                                How do I check if a SQLite3 database is connected in Python?
                            
                                Loading empty dictionary when YAML file is empty (Python 3.4)
                            
                                How do you dynamically assign aliases in a django aggregate?
                            
                                Save pandas csv to sub-directory
                            
                                Return 'similar score' based on two dictionaries' similarity in Python?
                            
                                Sum of multiple list of lists index wise
                            
                                PySpark: spit out single file when writing instead of multiple part files
                            
                                PySpark using IAM roles to access S3
                            
                                Applying a function along an axis of a dask array
                            
                                How does 'autodoc_default_flags' work in python Sphinx configuration?
                            
                                How to use compile_commands.json with clang python bindings?
                            
                                Advancing Python generator function to just before the first yield [duplicate]
                            
                                How to create a z-score in Spark SQL for each group
                            
                                Python multiple variables on left side of assignment operator
                            
                                Function Approximation: How is tile coding different from highly discretized state space?
                            
                                Vectorized implementation to create multiple rows from a single row in pandas dataframe
                            
                                ForeignKey with multiple models
                            
                                Python "Too many indices for array"
                            
                                How to change tab size in a specific file in Pycharm
                            
                                Is looping through a generator in a loop over that same generator safe in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With