I have a 2-dimensional numpy array with an equal number of columns and rows. I would like to arrange them into a bigger array having the smaller ones on the diagonal. It should be possible to specify how often the starting matrix should be on the diagonal. For example:
a = numpy.array([[5, 7],
[6, 3]])
So if I wanted this array 2 times on the diagonal the desired output would be:
array([[5, 7, 0, 0],
[6, 3, 0, 0],
[0, 0, 5, 7],
[0, 0, 6, 3]])
For 3 times:
array([[5, 7, 0, 0, 0, 0],
[6, 3, 0, 0, 0, 0],
[0, 0, 5, 7, 0, 0],
[0, 0, 6, 3, 0, 0],
[0, 0, 0, 0, 5, 7],
[0, 0, 0, 0, 6, 3]])
Is there a fast way to implement this with numpy methods and for arbitrary sizes of the starting array (still considering the starting array to have the same number of rows and columns)?
Approach #1
Classic case of numpy.kron
-
np.kron(np.eye(r,dtype=int),a) # r is number of repeats
Sample run -
In [184]: a
Out[184]:
array([[1, 2, 3],
[3, 4, 5]])
In [185]: r = 3 # number of repeats
In [186]: np.kron(np.eye(r,dtype=int),a)
Out[186]:
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 3, 0, 0, 0],
[0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 2, 3],
[0, 0, 0, 0, 0, 0, 3, 4, 5]])
Approach #2
Another efficient one with diagonal-viewed-array-assignment
-
def repeat_along_diag(a, r):
m,n = a.shape
out = np.zeros((r,m,r,n), dtype=a.dtype)
diag = np.einsum('ijik->ijk',out)
diag[:] = a
return out.reshape(-1,n*r)
Sample run -
In [188]: repeat_along_diag(a,3)
Out[188]:
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 3, 0, 0, 0],
[0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 2, 3],
[0, 0, 0, 0, 0, 0, 3, 4, 5]])
import numpy as np
from scipy.linalg import block_diag
a = np.array([[5, 7],
[6, 3]])
n = 3
d = block_diag(*([a] * n))
d
array([[5, 7, 0, 0, 0, 0],
[6, 3, 0, 0, 0, 0],
[0, 0, 5, 7, 0, 0],
[0, 0, 6, 3, 0, 0],
[0, 0, 0, 0, 5, 7],
[0, 0, 0, 0, 6, 3]], dtype=int32)
But it looks like np.kron solution is a little bit faster for small n.
%timeit np.kron(np.eye(n), a)
12.4 µs ± 95.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit block_diag(*([a] * n))
19.2 µs ± 34.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
However for n = 300, for example, the block_diag is much faster:
%timeit block_diag(*([a] * n))
1.11 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.kron(np.eye(n), a)
4.87 ms ± 31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For the specialized case of matrices, a simple slicing is WAY faster then numpy.kron()
(the slowest) and mostly on par with numpy.einsum()
-based approach (from @Divakar answer).
Compared to scipy.linalg.block_diag()
, it performs better for smaller arr
, somewhat independently of number of block repetitions.
Note that the performances of block_diag_view()
on smaller inputs can be easily further improved with Numba, but one would get worse performances for larger inputs.
With Numba, full explicit looping and parallelization (block_diag_loop_jit()
) one would get again similar results as block_diag_einsum()
if the number of repetitions is small.
Overall, the most performing solutions are block_diag_einsum()
and block_diag_view()
.
import numpy as np
import scipy as sp
import numba as nb
import scipy.linalg
NUM = 4
M = 9
def block_diag_kron(arr, num=NUM):
return np.kron(np.eye(num), arr)
def block_diag_einsum(arr, num=NUM):
rows, cols = arr.shape
result = np.zeros((num, rows, num, cols), dtype=arr.dtype)
diag = np.einsum('ijik->ijk', result)
diag[:] = arr
return result.reshape(rows * num, cols * num)
def block_diag_scipy(arr, num=NUM):
return sp.linalg.block_diag(*([arr] * num))
def block_diag_view(arr, num=NUM):
rows, cols = arr.shape
result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
for k in range(num):
result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
return result
@nb.jit
def block_diag_view_jit(arr, num=NUM):
rows, cols = arr.shape
result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
for k in range(num):
result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
return result
@nb.jit(parallel=True)
def block_diag_loop_jit(arr, num=NUM):
rows, cols = arr.shape
result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
for k in nb.prange(num):
for i in nb.prange(rows):
for j in nb.prange(cols):
result[i + (rows * k), j + (cols * k)] = arr[i, j]
return result
Benchmarks for NUM = 4
:
Benchmarks for NUM = 400
:
Plots were produced from this template using the following additional code:
def gen_input(n):
return np.random.randint(1, M, (n, n))
def equal_output(a, b):
return np.all(a == b)
funcs = block_diag_kron, block_diag_scipy, block_diag_view, block_diag_jit
input_sizes = tuple(int(2 ** (2 + (3 * i) / 4)) for i in range(13))
print('Input Sizes:\n', input_sizes, '\n')
runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=gen_input, equal_output=equal_output,
input_sizes=input_sizes)
plot_benchmarks(runtimes, input_sizes, labels, units='ms')
(EDITED to include np.einsum()
-based approach and another Numba version with explicit looping.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With