I get a ndarray
reading it from a file, like this
my_data = np.genfromtxt(input_file, delimiter='\t', skip_header=0)
Example input (parsed)
[[ 2. 1. 2. 0.]
[ 2. 2. 100. 0.]
[ 2. 3. 100. 0.]
[ 3. 1. 2. 0.]
[ 3. 2. 4. 0.]
[ 3. 3. 6. 0.]
[ 4. 1. 2. 0.]
[ 4. 2. 4. 0.]
[ 4. 3. 6. 0.]]
Longer example input (unparsed).
The first 2 columns are supposed to be int
, while the last 2 columns are supposed to be float
, but that's what I get. Suggestions are welcome.
The main problem is, I'm trying to sort it, using Numpy, so that rows get ordered giving precedence to the numbers on second column first, and on the first column next.
Example of desired output
[[ 2. 1. 2. 0.]
[ 3. 1. 2. 0.]
[ 4. 1. 2. 0.]
[ 2. 2. 100. 0.]
[ 3. 2. 4. 0.]
[ 4. 2. 4. 0.]
[ 2. 3. 100. 0.]
[ 3. 3. 6. 0.]
[ 4. 3. 6. 0.]]
I'm aware of this answer, it works for sorting rows on a single column.
I tried sorting on the second column, since the first one is already sorted, but it's not enough. On occasion, the first column gets reordered too, badly.
new_data = my_data[my_data[:, 1].argsort()]
print(new_data)
#output
[[ 2. 1. 2. 0.]
[ 4. 1. 2. 0.] #ouch
[ 3. 1. 2. 0.] #ouch
[ 2. 2. 100. 0.]
[ 3. 2. 4. 0.]
[ 4. 2. 4. 0.]
[ 2. 3. 100. 0.]
[ 3. 3. 6. 0.]
[ 4. 3. 6. 0.]]
I've also checked this question
The answer mentions
The problem here is that np.lexsort or np.sort do not work on arrays of dtype object. To get around that problem, you could sort the rows_list before creating order_list:
import operator
rows_list.sort(key=operator.itemgetter(0,1,2))
But I there is no key
parameter in the sort
function of type ndarray
. And merging fields is not an alternative in my case.
Also, I don't have a header, so, if I try to sort using the order
parameter, I get an error.
ValueError: Cannot specify order when the array has no fields.
I'd rather sort in place or at least obtain a result of the same type ndarray
. Then I want to save it to a file.
How do I do this, without messing the datatypes?
NumPy arrays can be sorted by a single column, row, or by multiple columns or rows using the argsort() function. The argsort function returns a list of indices that will sort the values in an array in ascending value.
Sort the rows of a 2D array in descending order The code axis = 1 indicates that we'll be sorting the data in the axis-1 direction, and by using the negative sign in front of the array name and the function name, the code will sort the rows in descending order.
The NumPy ndarray object has a function called sort() , that will sort a specified array.
>>> a = np.array([[1,30,200], [2,20,300], [3,10,100]])
>>> a
array([[ 1, 30, 200],
[ 2, 20, 300],
[ 3, 10, 100]])
>>> a[a[:,2].argsort()] #sort by the 3rd column ascending
array([[ 3, 10, 100],
[ 1, 30, 200],
[ 2, 20, 300]])
>>> a[a[:,2].argsort()][::-1] #sort by the 3rd column descending
array([[ 2, 20, 300],
[ 1, 30, 200],
[ 3, 10, 100]])
>>> a[a[:,1].argsort()] #sort by the 2nd column ascending
array([[ 3, 10, 100],
[ 2, 20, 300],
[ 1, 30, 200]])
To explain what is going on here: argsort()
is passing back an array containing integer sequence of its parent:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html
>>> x = np.array([15, 30, 4, 80, 6])
>>> np.argsort(x)
array([2, 4, 0, 1, 3])
>>> a = np.array([[2,30,200], [1,30,200], [1,10,200]])
>>> a
array([[ 2, 30, 200],
[ 1, 30, 200],
[ 1, 10, 200]])
>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))]
array([[ 1, 10, 200],
[ 1, 30, 200],
[ 2, 30, 200]])
Same as above but reversed:
>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))][::-1]
array([[ 2 30 200]
[ 1 30 200]
[ 1 10 200]])
Import letting Numpy guess the type and sorting in place:
import numpy as np
# let numpy guess the type with dtype=None
my_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"])
# access columns by name
print(my_data["b"]) # column 1
# sort column 1 and column 0
my_data.sort(order=["b", "a"])
# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f"
Alternatively, specifying the input format and sorting to a new array:
import numpy as np
# tell numpy the first 2 columns are int and the last 2 are floats
my_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')])
# access columns by name
print(my_data["b"]) # column 1
# get the indices to sort the array using lexsort
# the last element of the tuple (column 1) is used as the primary key
ind = np.lexsort((my_data["a"], my_data["b"]))
# create a new, sorted array
sorted_data = my_data[ind]
# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f")
Output:
2 1 2.000000 0.000000
3 1 2.000000 0.000000
4 1 2.000000 0.000000
2 2 100.000000 0.000000
3 2 4.000000 0.000000
4 2 4.000000 0.000000
2 3 100.000000 0.000000
3 3 6.000000 0.000000
4 3 6.000000 0.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With