Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient swapping of elements in numpy array

Assuming that we have a large matrix A, and the indices of two matrix elements (c1, r1), (c2, r2) that we want to swap:

import numpy as np
A = np.random.rand(1000,1000)
c1, r1 = 10, 10
c2, r2 = 20, 40

The pythonic way to do so would be:

A[c1, r1], A[c2, r2] = A[c2, r2], A[c1, r1]

However, this solution can be slow if you want to do a large number of swappings.

Is there a more efficient way to swap two elements in a numpy array?

Thanks in advance.

like image 956
lackadaisical Avatar asked Feb 20 '15 11:02

lackadaisical


People also ask

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.

Is NumPy array memory efficient?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

Is NumPy array more efficient than list?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


1 Answers

Preliminary answer, which does not work

You can easily vectorize the swap operation, by using arrays of indexes (c1, r1, c2, r2) instead of iterating over lists of scalar indices.

c1 = np.array(<all the "c1" values>, dtype=int)
r1 = np.array(<all the "r1" values>, dtype=int)
c2 = np.array(<all the "c2" values>, dtype=int)
r2 = np.array(<all the "r2" values>, dtype=int)
A[c1, r1], A[c2, r2] = A[c2, r2], A[c1, r1]

Note this performs all the swaps in one go, which can be different than iteratively, if the order of the swapping makes a difference. For this reason, this is not a valid answer to your question.

E.g. swapping p1 with p2, then p2 with p3, is different from swapping p1 and p2, and p2 and p3 in one go. In the latter case, both p1 and p3 get the original value of p2, and p2 gets the last of the values between p1 and p3, i.e. p3 (according to the order they appear in the index-array).

However, since it is speed you're after, vectorizing the operation (in some way) must be the way to go.


Adding correctness to the above solution

So how can we perform vectorized swapping, while getting the output we need? We can take a hybrid approach, by breaking the lists of indexes into chunks (as few as possible), where each chunk only contains unique points, thus guaranteeing the order makes no difference. Swapping each chunk is done vercrorized-ly, and we only iterate over the chunks.

Here's a sketch of how this can work:

c1, r1 = np.array([ np.arange(10), np.arange(10) ])
c2, r2 = np.array([ [2,4,6,8,0,1,3,5,7,9], [9,1,6,8,2,2,2,2,7,0] ])
A = np.empty((15,15))

def get_chunk_boundry(c1, r1, c2, r2):
    a1 = c1 + 1j * r1
    a2 = c2 + 1j * r2
    set1 = set()
    set2 = set()
    for i, (x1, x2) in enumerate(zip(a1, a2)):
        if x1 in set2 or x2 in set1:
            return i
        set1.add(x1); set2.add(x2)
    return len(c1)

while len(c1) > 0:
    i = get_chunk_boundry(c1, r1, c2, r2)
    c1b = c1[:i]; r1b = r1[:i]; c2b = c2[:i]; r2b = r2[:i]
    print 'swapping %d elements' % i
    A[c1b,r1b], A[c2b,r2b] = A[c2b,r2b], A[c1b,r1b]
    c1 = c1[i:]; r1 = r1[i:]; c2 = c2[i:]; r2 = r2[i:]

Slicing here will be faster if you store the indices as a 2dim array (N x 4) to begin with.

like image 158
shx2 Avatar answered Oct 27 '22 21:10

shx2