I get a <code>ndarray</code> reading it from a file, like this <pre class="prettyprint"><code>my_data = np.genfromtxt(input_file, delimiter='\t', skip_header=0) </code></pre> Example input (parsed) <pre class="prettyprint"><code>[[ 2. 1. 2. 0.] [ 2. 2. 100. 0.] [ 2. 3. 100. 0.] [ 3. 1. 2. 0.] [ 3. 2. 4. 0.] [ 3. 3. 6. 0.] [ 4. 1. 2. 0.] [ 4. 2. 4. 0.] [ 4. 3. 6. 0.]] </code></pre> Longer example input (unparsed). The first 2 columns are supposed to be <code>int</code>, while the last 2 columns are supposed to be <code>float</code>, but that's what I get. Suggestions are welcome. The main problem is, I'm trying to sort it, using Numpy, so that rows get ordered giving precedence to the numbers on second column first, and on the first column next. Example of desired output <pre class="prettyprint"><code>[[ 2. 1. 2. 0.] [ 3. 1. 2. 0.] [ 4. 1. 2. 0.] [ 2. 2. 100. 0.] [ 3. 2. 4. 0.] [ 4. 2. 4. 0.] [ 2. 3. 100. 0.] [ 3. 3. 6. 0.] [ 4. 3. 6. 0.]] </code></pre> I'm aware of this answer, it works for sorting rows on a single column. I tried sorting on the second column, since the first one is already sorted, but it's not enough. On occasion, the first column gets reordered too, badly. <pre class="prettyprint"><code>new_data = my_data[my_data[:, 1].argsort()] print(new_data) #output [[ 2. 1. 2. 0.] [ 4. 1. 2. 0.] #ouch [ 3. 1. 2. 0.] #ouch [ 2. 2. 100. 0.] [ 3. 2. 4. 0.] [ 4. 2. 4. 0.] [ 2. 3. 100. 0.] [ 3. 3. 6. 0.] [ 4. 3. 6. 0.]] </code></pre> I've also checked this question The answer mentions <blockquote> The problem here is that np.lexsort or np.sort do not work on arrays of dtype object. To get around that problem, you could sort the rows_list before creating order_list: </blockquote> <pre class="prettyprint"><code>import operator rows_list.sort(key=operator.itemgetter(0,1,2)) </code></pre> But I there is no <code>key</code> parameter in the <code>sort</code> function of type <code>ndarray</code>. And merging fields is not an alternative in my case. Also, I don't have a header, so, if I try to sort using the <code>order</code> parameter, I get an error. <pre class="prettyprint"><code>ValueError: Cannot specify order when the array has no fields. </code></pre> I'd rather sort in place or at least obtain a result of the same type <code>ndarray</code>. Then I want to save it to a file. How do I do this, without messing the datatypes?

Import letting Numpy guess the type and sorting in place: <pre class="prettyprint"><code>import numpy as np # let numpy guess the type with dtype=None my_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"]) # access columns by name print(my_data["b"]) # column 1 # sort column 1 and column 0 my_data.sort(order=["b", "a"]) # save specifying required format (tab separated values) np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f" </code></pre> Alternatively, specifying the input format and sorting to a new array: <pre class="prettyprint"><code>import numpy as np # tell numpy the first 2 columns are int and the last 2 are floats my_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')]) # access columns by name print(my_data["b"]) # column 1 # get the indices to sort the array using lexsort # the last element of the tuple (column 1) is used as the primary key ind = np.lexsort((my_data["a"], my_data["b"])) # create a new, sorted array sorted_data = my_data[ind] # save specifying required format (tab separated values) np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f") </code></pre> Output: <pre class="prettyprint"><code>2 1 2.000000 0.000000 3 1 2.000000 0.000000 4 1 2.000000 0.000000 2 2 100.000000 0.000000 3 2 4.000000 0.000000 4 2 4.000000 0.000000 2 3 100.000000 0.000000 3 3 6.000000 0.000000 4 3 6.000000 0.000000 </code></pre>

Numpy sort ndarray on multiple columns

Tags:

python

arrays

sorting

numpy

I get a ndarray reading it from a file, like this

my_data = np.genfromtxt(input_file, delimiter='\t', skip_header=0)

Example input (parsed)

[[   2.    1.    2.    0.]
 [   2.    2.  100.    0.]
 [   2.    3.  100.    0.]
 [   3.    1.    2.    0.]
 [   3.    2.    4.    0.]
 [   3.    3.    6.    0.]
 [   4.    1.    2.    0.]
 [   4.    2.    4.    0.]
 [   4.    3.    6.    0.]]

Longer example input (unparsed).

The first 2 columns are supposed to be int, while the last 2 columns are supposed to be float, but that's what I get. Suggestions are welcome.

The main problem is, I'm trying to sort it, using Numpy, so that rows get ordered giving precedence to the numbers on second column first, and on the first column next.

Example of desired output

[[   2.    1.    2.    0.]
 [   3.    1.    2.    0.]
 [   4.    1.    2.    0.]
 [   2.    2.  100.    0.]
 [   3.    2.    4.    0.]
 [   4.    2.    4.    0.]
 [   2.    3.  100.    0.]
 [   3.    3.    6.    0.]
 [   4.    3.    6.    0.]]

I'm aware of this answer, it works for sorting rows on a single column.

I tried sorting on the second column, since the first one is already sorted, but it's not enough. On occasion, the first column gets reordered too, badly.

new_data = my_data[my_data[:, 1].argsort()]
print(new_data)

#output
[[   2.    1.    2.    0.]
 [   4.    1.    2.    0.] #ouch
 [   3.    1.    2.    0.] #ouch
 [   2.    2.  100.    0.]
 [   3.    2.    4.    0.]
 [   4.    2.    4.    0.]
 [   2.    3.  100.    0.]
 [   3.    3.    6.    0.]
 [   4.    3.    6.    0.]]

I've also checked this question

The answer mentions

The problem here is that np.lexsort or np.sort do not work on arrays of dtype object. To get around that problem, you could sort the rows_list before creating order_list:

import operator
rows_list.sort(key=operator.itemgetter(0,1,2))

But I there is no key parameter in the sort function of type ndarray. And merging fields is not an alternative in my case.

Also, I don't have a header, so, if I try to sort using the order parameter, I get an error.

ValueError: Cannot specify order when the array has no fields.

I'd rather sort in place or at least obtain a result of the same type ndarray. Then I want to save it to a file.

How do I do this, without messing the datatypes?

591

asked Mar 30 '15 17:03

Agostino

2 Answers

numpy ndarray sort by the 1st, 2nd or 3rd column:

>>> a = np.array([[1,30,200], [2,20,300], [3,10,100]])

>>> a
array([[  1,  30, 200],         
       [  2,  20, 300],          
       [  3,  10, 100]])

>>> a[a[:,2].argsort()]           #sort by the 3rd column ascending
array([[  3,  10, 100],
       [  1,  30, 200],
       [  2,  20, 300]])

>>> a[a[:,2].argsort()][::-1]     #sort by the 3rd column descending
array([[  2,  20, 300],
       [  1,  30, 200],
       [  3,  10, 100]])

>>> a[a[:,1].argsort()]        #sort by the 2nd column ascending
array([[  3,  10, 100],
       [  2,  20, 300],
       [  1,  30, 200]])

To explain what is going on here: argsort() is passing back an array containing integer sequence of its parent: https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

>>> x = np.array([15, 30, 4, 80, 6])
>>> np.argsort(x)
array([2, 4, 0, 1, 3])

Sort by column 3, then by column 2 then 1:

>>> a = np.array([[2,30,200], [1,30,200], [1,10,200]])

>>> a
array([[  2,  30, 200],
       [  1,  30, 200],
       [  1,  10, 200]])

>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))]
array([[  1,  10, 200],
       [  1,  30, 200],
       [  2,  30, 200]])

Same as above but reversed:

>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))][::-1]
array([[  2  30 200]
       [  1  30 200]
       [  1  10 200]])

187

answered Sep 22 '22 08:09

Eric Leschinski

Import letting Numpy guess the type and sorting in place:

import numpy as np

# let numpy guess the type with dtype=None
my_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"])

# access columns by name
print(my_data["b"]) # column 1

# sort column 1 and column 0 
my_data.sort(order=["b", "a"])

# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f"

Alternatively, specifying the input format and sorting to a new array:

import numpy as np

# tell numpy the first 2 columns are int and the last 2 are floats
my_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')])

# access columns by name
print(my_data["b"]) # column 1

# get the indices to sort the array using lexsort
# the last element of the tuple (column 1) is used as the primary key
ind = np.lexsort((my_data["a"], my_data["b"]))

# create a new, sorted array
sorted_data = my_data[ind]

# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f")

Output:

2   1   2.000000    0.000000
3   1   2.000000    0.000000
4   1   2.000000    0.000000
2   2   100.000000  0.000000
3   2   4.000000    0.000000
4   2   4.000000    0.000000
2   3   100.000000  0.000000
3   3   6.000000    0.000000
4   3   6.000000    0.000000

answered Sep 19 '22 08:09

Padraic Cunningham

Related questions
                            
                                Can't import Webkit from gi.repository
                            
                                How to keep a socket open until client closes it?
                            
                                Limiting Python input strings to certain characters and lengths
                            
                                python sqlalchemy get column names dynamically?
                            
                                Python subprocess module much slower than commands (deprecated)
                            
                                Inherent way to save web page source
                            
                                Priority queue with higher priority first in Python
                            
                                python netcdf: making a copy of all variables and attributes but one
                            
                                Pandas: reshaping data
                            
                                Why is the 'running' of .pyc files not faster compared to .py files?
                            
                                Convert a HTML Table to JSON
                            
                                Installing python modules on Ubuntu
                            
                                How do I check if raw input is integer in python 2.7?
                            
                                Django DecimalField generating "quantize result has too many digits for current context" error on save
                            
                                Keep finite entries only in Pandas
                            
                                Read Space-separated Data with Pandas [duplicate]
                            
                                In pandas/python, reading array stored as string
                            
                                Django - Filter queryset by CharField value length
                            
                                Saving nltk drawn parse tree to image file
                            
                                How to install pygments on Ubuntu?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With