Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to sort each row in a pandas dataframe

Tags:

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

So something like this:

A   B   C   D
3   4   8   1
9   2   7   2

Needs to become:

A   B   C   D
8   4   3   1
9   7   2   2

Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?

like image 350
Luke Avatar asked Sep 12 '14 22:09

Luke


People also ask

How do I sort rows in Pandas Dataframe?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.

Is Pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.

How do you sort Dataframe rows based on one column?

You can sort by column values in pandas DataFrame using sort_values() method. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

How do you sort Pandas Dataframe from highest to lowest?

In order to sort the data frame in pandas, function sort_values() is used. Pandas sort_values() can sort the data frame in Ascending or Descending order.


2 Answers

I think I would do this in numpy:

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

I had thought this might work, but it sorts the columns:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

like image 103
Andy Hayden Avatar answered Sep 17 '22 07:09

Andy Hayden


To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.

    In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]: 
    one  two  three  four  five
    0   22   46     49    63    72
    1   25   30     33    43    69
    2   21   24     39    56    93
    3    3   11     52    57    74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]: 
    five  four  three  two  one
    0    72    63     49   46   22
    1    69    43     33   30   25
    2    93    56     39   24   21
    3    74    57     52   11    3

I hope someone can explain the why of this, just happy that it works 8)

like image 40
SpmP Avatar answered Sep 18 '22 07:09

SpmP