Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to convert upper triangular matrix into symmetric matrix

I have an upper-triangular matrix of np.float64 values, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 0.,  5.,  6.,  7.],
       [ 0.,  0.,  8.,  9.],
       [ 0.,  0.,  0., 10.]])

I would like to convert this into the corresponding symmetric matrix, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 2.,  5.,  6.,  7.],
       [ 3.,  6.,  8.,  9.],
       [ 4.,  7.,  9., 10.]])

The conversion can be done in place, or as a new matrix. I would like it to be as fast as possible. How can I do this quickly?

like image 997
Kerrick Staley Avatar asked Nov 05 '19 19:11

Kerrick Staley


3 Answers

np.where seems quite fast in the out-of-place, no-cache scenario:

np.where(ut,ut,ut.T)

On my laptop:

timeit(lambda:np.where(ut,ut,ut.T))
# 1.909718865994364

If you have pythran installed you can speed this up 3 times with near zero effort. But note that as far as I know pythran (currently) only understands contguous arrays.

file <upp2sym.py>, compile with pythran -O3 upp2sym.py

import numpy as np

#pythran export upp2sym(float[:,:])

def upp2sym(a):
    return np.where(a,a,a.T)

Timing:

from upp2sym import *

timeit(lambda:upp2sym(ut))
# 0.5760842661838979

This is almost as fast as looping:

#pythran export upp2sym_loop(float[:,:])

def upp2sym_loop(a):
    out = np.empty_like(a)
    for i in range(len(a)):
        out[i,i] = a[i,i]
        for j in range(i):
            out[i,j] = out[j,i] = a[j,i]
    return out

Timing:

timeit(lambda:upp2sym_loop(ut))
# 0.4794591029640287

We can also do it inplace:

#pythran export upp2sym_inplace(float[:,:])

def upp2sym_inplace(a):
    for i in range(len(a)):
        for j in range(i):
            a[i,j] = a[j,i]

Timing

timeit(lambda:upp2sym_inplace(ut))
# 0.28711927914991975
like image 131
Paul Panzer Avatar answered Oct 01 '22 01:10

Paul Panzer


This is the fastest routine I've found so far that doesn't use Cython or a JIT like Numba. I takes about 1.6 μs on my machine to process a 4x4 array (average time over a list of 100K 4x4 arrays):

inds_cache = {}

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    try:
        inds = inds_cache[n]
    except KeyError:
        inds = np.tri(n, k=-1, dtype=np.bool)
        inds_cache[n] = inds
    ut[inds] = ut.T[inds]

Here are some other things I've tried that are not as fast:

The above code, but without the cache. Takes about 8.3 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    inds = np.tri(n, k=-1, dtype=np.bool)
    ut[inds] = ut.T[inds]

A plain Python nested loop. Takes about 2.5 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Floating point addition using np.triu. Takes about 11.9 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    ut += np.triu(ut, k=1).T

Numba version of Python nested loop. This was the fastest thing I found (about 0.4 μs per 4x4 array), and was what I ended up using in production, at least until I started running into issues with Numba and had to revert back to a pure Python version:

import numba

@numba.njit()
def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Cython version of Python nested loop. I'm new to Cython so this may not be fully optimized. Since Cython adds operational overhead, I'm interested in hearing both Cython and pure-Numpy answers. Takes about 0.6 μs per 4x4 array:

cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def upper_triangular_to_symmetric(np.ndarray[np.float64_t, ndim=2] ut):
    cdef int n, r, c
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]
like image 23
Kerrick Staley Avatar answered Oct 01 '22 00:10

Kerrick Staley


You are mainly measuring function call overhead on such tiny problems

Another way to do that would be to use Numba. Let's start with a implementation for only one (4x4) array.

Only one 4x4 array

import numpy as np
import numba as nb

@nb.njit()
def sym(A):
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[j,i]=A[i,j]
    return A


A=np.array([[ 1.,  2.,  3.,  4.],
       [ 0.,  5.,  6.,  7.],
       [ 0.,  0.,  8.,  9.],
       [ 0.,  0.,  0., 10.]])

%timeit sym(A)
#277 ns ± 5.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Larger example

@nb.njit(parallel=False)
def sym_3d(A):
    for i in nb.prange(A.shape[0]):
        for j in range(A.shape[1]):
            for k in range(A.shape[2]):
                A[i,k,j]=A[i,j,k]
    return A

A=np.random.rand(1_000_000,4,4)

%timeit sym_3d(A)
#13.8 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#13.8 ns per 4x4 submatrix
like image 26
max9111 Avatar answered Oct 01 '22 01:10

max9111