Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient element-wise matrix operations in Julia

I need to perform a discretised convolution of (complex) matrices, and defined the following function in Julia:

function convolve(M::Array{Complex{Float64},2}, K::Array{Float64,2}, p::Int)
    (n,m) = size(M)
    res = zeros(Complex{Float64},n)
    for k=1:p
        for l=1:n
            res[l] += M[l,k]*K[l,end-p+k]
        end
    end
    return res
end

I use it like so:

M=complex(rand(2000,2000))
K=rand(2000,2000)
@time convolve(M,K,2000,0)

Now this is relatively fast and suprisingly, faster (about 3 times) than the vectorised version where I replace the inner loop with res += M[:,k].*K[:,end-p+k]. (I think it is due to a lot of memeory allocation for temporary arrays, I can live with that).

But a vectorised MATLAB code runs about 5 times faster:

function res = convolve(M, K, p)
    n = size(M,1);
    res = zeros(n,1);
    for k=1:p
        res = res + M(:,k).*K(:,end-p+k);
    end
end

What am I doing wrong and how do I get Julia to perform such element-wise multiplications as fast as MATLAB ? Is it an indexing problem ?

Note: I have checked with @code_warntype that there is no funny business with type indecision (no Any or Union etc), but the problem may be more subtle. The macro @code_llvm produces a suprisingly long output, but I am no expert so it is hard for me to see what is going on.

like image 503
Nico B Avatar asked Nov 22 '16 14:11

Nico B


1 Answers

The following version is faster on my machine:

function convolve2(M::Array{Complex{Float64},2}, K::Array{Float64,2}, p::Int)
    (n,m) = size(M)
    res = zeros(Complex{Float64},n)
    offset = size(K,2)-p
    (p>m || offset<0) && error("parameter p ($p) out of bounds")
    @inbounds for k=1:p
        @inbounds @simd for l=1:n
            res[l] += M[l,k]*K[l,offset+k]
        end
    end
    return res
end

Note the @simd addition which uses vector instructions currently in many CPUs.

EDIT: The performance hit in the OP's code seems to stem from using end in the index of K in the hot loop line. Redefining Base.trailingsize with @inline makes LLVM inline the end (on my machine) and make the two versions run about the same speed. The code used:

import Base: trailingsize
@inline function Base.trailingsize(A, n)
  s = 1
  for i=n:ndims(A)
    s *= size(A,i)
  end
  return s
end

See comments on the question for the issue #19389 regarding this.

like image 78
Dan Getz Avatar answered Nov 09 '22 14:11

Dan Getz