Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way of making a list of pairs from an array in Numpy

I have a numpy array x (with (n,4) shape) of integers like:

[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]

I want to transform the array into an array of pairs:

[0,1]
[0,2]
[0,3]
[1,2]
...

so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:

y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)

but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:

y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...] 

I can repeat this for all columns. My questions are:

  1. How can I append y[2] to y[1],... such that the shape is (N,2)?
  2. If number of columns is not small (in this example 4), how can I find y[i] elegantly?
  3. What are the alternative ways to achieve the final array?
like image 210
Mahdi Avatar asked Dec 09 '25 06:12

Mahdi


2 Answers

The cleanest way of doing this I can think of would be:

>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0,  0,  0,  1,  2,  3],
       [ 4,  4,  4,  5,  6,  7],
       [ 8,  8,  8,  9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0,  1],
       [ 0,  2],
       [ 0,  3],
       [ 4,  5],
       [ 4,  6],
       [ 4,  7],
       [ 8,  9],
       [ 8, 10],
       [ 8, 11]])

This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:

>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0,  1],
       [ 0,  2],
       [ 0,  3],
       [ 4,  5],
       [ 4,  6],
       [ 4,  7],
       [ 8,  9],
       [ 8, 10],
       [ 8, 11]])
like image 146
Jaime Avatar answered Dec 10 '25 19:12

Jaime


Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:

x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])

producing

array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 2],
       [1, 7],
       [1, 9],
       [2, 1],
       [2, 5],
       [2, 2]])

There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.

like image 26
hpaulj Avatar answered Dec 10 '25 20:12

hpaulj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!