Why is this (very simple) vectorized code orders of magnitude slower than Numpy?

Question

I know very well Julia's philosophy concerning vectorization and the like, and I can accept having my code run even ten times slower than Numpy, but why does this code run much, much slower than Numpy? Maybe there is a mistake in the code; I can't imagine the issue is related to using vectorization rather than loops.

I am using vectorization because my requirements are not strong; furthermore memory allocation doesn't seem to be that incredible (and it proves to be very fast with Numpy). The code is also easy to read.

The following piece of code is written in Python; it computes a generalized continued fraction over a part of the complex plane. The continued fraction is defined by two different functions; I use this code for plotting pictures on this blog: https://kettenreihen.wordpress.com/

The Python code is below; it computes 640x640 values in about 2 seconds:

def K(a, b, C):
    s = C.shape
    P1 = np.ones(s, dtype=np.complex)
    P2 = np.zeros(s, dtype=np.complex)
    Q1 = np.zeros(s, dtype=np.complex)
    Q2 = np.ones(s, dtype=np.complex)
    for n in range(1, 65):
        A, B = a(C, n), b(C, n)
        P = A*P2 + B*P1
        Q = A*Q2 + B*Q1
        P1, P2 = P2, P
        Q1, Q2 = Q2, Q
    return P2/Q2

The following Julia code should do the same, but it takes 2 or 3 minutes for computing the same thing.

function K(a, b, C)
    s = size(C)
    P1 = ones(Complex, s)
    P2 = zeros(Complex, s)
    Q1 = zeros(Complex, s)
    Q2 = ones(Complex, s)
    for n = 1:64
        println(n)
        A, B = a(C, n), b(C, n)
        P = A.*P2 + B.*P1
        Q = A.*Q2 + B.*Q1
        P1, P2 = P2, P
        Q1, Q2 = Q2, Q
    end
    return P2./Q2
end

StefanKarpinski · Accepted Answer

You are allocating matrices with an abstract element type: the Complex type is an abstract supertype for all specific Complex{T} types. What you want here is arrays of some concrete element type like Complex128 == Complex{Float64} or Complex64 == Complex{Float32}. Presumably in NumPy dtype=np.complex refers to a specific complex type, probably the equivalent of Complex128.

If you want to write code that is generic and will work for different kinds of C matrices, then assuming C is a complex matrix and what you want is to create matrices of ones and zeros of the same element type and shape, you can just call the ones and zeros functions on C to get matrices with the right element type and shape:

function K(a, b, C)
    P1 = ones(C)
    P2 = zeros(C)
    Q1 = zeros(C)
    Q2 = ones(C)
    for n = 1:64
        println(n)
        A, B = a(C, n), b(C, n)
        P = A.*P2 + B.*P1
        Q = A.*Q2 + B.*Q1
        P1, P2 = P2, P
        Q1, Q2 = Q2, Q
    end
    return P2./Q2
end

Hopefully that helps the performance. You might be able to get even more performance by pre-allocating three matrices and doing operations in place, rotating through the matrices. However, the convenient syntax support for that hasn't made it into a stable release yet, so on Julia 0.5 it would still be a little verbose, but could get you a performance boost over the vectorized version.

Why is this (very simple) vectorized code orders of magnitude slower than Numpy?

Tags:

vectorization

julia

Thomas Baruchel

1 Answers

StefanKarpinski

Recent Activity

Donate For Us

Why is this (very simple) vectorized code orders of magnitude slower than Numpy?

Tags:

vectorization

julia

Thomas Baruchel

1 Answers

StefanKarpinski

Related questions

Recent Activity

Donate For Us