Sorting arrays in NumPy by column

People also ask

How can we sort a 2D NumPy array based on two columns?

Sorting 2D Numpy Array by column at index 1 Select the column at index 1 from 2D numpy array i.e. It returns the values at 2nd column i.e. column at index position 1 i.e. Now get the array of indices that sort this column i.e. It returns the index positions that can sort the above column i.e.

To sort by the second column of a:

a[a[:, 1].argsort()]

@steve's answer is actually the most elegant way of doing it.

For the "correct" way see the order keyword argument of numpy.ndarray.sort

However, you'll need to view your array as an array with fields (a structured array).

The "correct" way is quite ugly if you didn't initially define your array with fields...

As a quick example, to sort it and return a copy:

In [1]: import numpy as np

In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])

In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None

In [7]: a
Out[7]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

@Steve's really is the most elegant way to do it, as far as I know...

The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].

You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]

This sorts by column 0, then 1, then 2.

In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:

import numpy as np
table = np.random.rand(5000, 10)

%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop

%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

So, it looks like indexing with argsort is the quickest method so far...

From the Python documentation wiki, I think you can do:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); 
a = sorted(a, key=lambda a_entry: a_entry[1]) 
print a

The output is:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

Related questions
                            
                                What is memoization and how can I use it in Python?
                            
                                What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits?
                            
                                What is the purpose of meshgrid in Python / NumPy?
                            
                                Improve subplot size/spacing with many subplots in matplotlib
                            
                                Python Progress Bar
                            
                                How to get current CPU and RAM usage in Python?
                            
                                Get first row value of a given column
                            
                                How do I find out my PYTHONPATH using Python?
                            
                                Python argparse command line flags without arguments
                            
                                Automatically creating directories with file output [duplicate]
                            
                                JSONDecodeError: Expecting value: line 1 column 1 (char 0)
                            
                                How do I get the path of the Python script I am running in? [duplicate]
                            
                                Installing Python packages from local file system folder to virtualenv with pip
                            
                                How to get POSTed JSON in Flask?
                            
                                Implement touch using Python?
                            
                                Removing Conda environment
                            
                                How do I create test and train samples from one dataframe with pandas?
                            
                                Selecting/excluding sets of columns in pandas [duplicate]
                            
                                Convert Python dict into a dataframe
                            
                                What is __main__.py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sorting arrays in NumPy by column

Tags:

python

arrays

sorting

numpy

scipy

People also ask

Recent Activity

Donate For Us