Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia - the way of kings (generator performance)

Tags:

julia

I had some python code which I tried to port to Julia to learn this lovely language. I used generators in python. After porting it seems to me (at this moment) that Julia is really slow in this area!

I made part of my code simplified to this exercise:

Think 4x4 chess board. Find every N-moves long path, chess king could do. In this exercise, the king is not allowed to leap twice at the same position in one path. Don't waste memory -> make a generator of every path.

Algorithm is pretty simple:

if we sign every position with numbers:

0  1  2  3
4  5  6  7
8  9  10 11
12 13 14 16

point 0 has 3 neighbors (1, 4, 5). We could find a table for every neighbor for every point:

NEIG = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]]

PYTHON

A recursive function (generator) which enlarge given path from the list of points or from a generator of (generator of ...) points:

def enlarge(path):
    if isinstance(path, list):
        for i in NEIG[path[-1]]:
            if i not in path:
                yield path[:] + [i]
    else:
        for i in path:
            yield from enlarge(i)

Function (generator) which give every path with given length

def paths(length):
    steps = ([i] for i in range(16))  # first steps on every point on board
    for _ in range(length-1):
        nsteps = enlarge(steps)
        steps = nsteps
    yield from steps

We could see that there are 905776 paths with length 10:

sum(1 for i in paths(10))
Out[89]: 905776

JULIA (this code was created by @gggg during our discussion here )

const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py]
function enlarge(path::Vector{Int})
    (push!(copy(path),loc) for loc in NEIG[path[end]] if !(loc in path))
end
collect(enlarge([1]))
function enlargepaths(paths)
    Iterators.Flatten(enlarge(path) for path in paths)
end
collect(enlargepaths([[1],[2]]))
function paths(targetlen)
    paths = ([i] for i=1:16)
    for newlen in 2:targetlen
        paths = enlargepaths(paths)
    end
    paths
end
p = sum(1 for path in paths(10))

benchmark

In ipython we could time it:

python 3.6.3:

%timeit sum(1 for i in paths(10))
1.25 s ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

julia 0.6.0

julia> @time sum(1 for path in paths(10))
  2.690630 seconds (41.91 M allocations: 1.635 GiB, 11.39% gc time)
905776

Julia 0.7.0-DEV.0

julia> @time sum(1 for path in paths(10))
  4.951745 seconds (35.69 M allocations: 1.504 GiB, 4.31% gc time)
905776

Question(s):

We Julians are saying this: It is important to note that the benchmark codes are not written for absolute maximal performance (the fastest code to compute recursion_fibonacci(20) is the constant literal 6765). Instead, the benchmarks are written to test the performance of identical algorithms and code patterns implemented in each language.

In this benchmark, we are using the same idea. Just simple for cycles over arrays enclosed to generators. (Nothing from numpy, numba, pandas or others c-written and compiled python packages)

Is assumption that Julia's generators are terribly slow right?

What could we do to make it really fast?

like image 755
Liso Avatar asked Nov 03 '17 06:11

Liso


2 Answers

Julia's "better performance" than Python isn't magical. Most of it stems directly from the fact that Julia can figure out what each variable's type within a function will be, and then compile highly specialized code for those specific types. This even applies to the elements in many containers and iterables like generators; Julia often knows ahead of time what type the elements will be. Python isn't able to do this analysis nearly as easily (or at all, in many cases), so its optimizations have focused on improving the dynamic behaviors.

In order for Julia's generators to know ahead of time what kinds of types they might produce, they encapsulate information about both the operation they perform and the object they iterate over in the type:

julia> (1 for i in 1:16)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##27#28"))}(getfield(Main, Symbol("##27#28"))(), 1:16)

That weird ##27#28 thing is the type of an anonymous function that simply returns 1. By the time the generator gets to LLVM, it knows enough to perform quite a large number of optimizations:

julia> function naive_sum(c)
           s = 0
           for elt in c
               s += elt
           end
           s
       end
       @code_llvm naive_sum(1 for i in 1:16)

; Function naive_sum
; Location: REPL[1]:2
define i64 @julia_naive_sum_62385({ { i64, i64 } } addrspace(11)* nocapture nonnull readonly dereferenceable(16)) {
top:
; Location: REPL[1]:3
  %1 = getelementptr inbounds { { i64, i64 } }, { { i64, i64 } } addrspace(11)* %0, i64 0, i32 0, i32 0
  %2 = load i64, i64 addrspace(11)* %1, align 8
  %3 = getelementptr inbounds { { i64, i64 } }, { { i64, i64 } } addrspace(11)* %0, i64 0, i32 0, i32 1
  %4 = load i64, i64 addrspace(11)* %3, align 8
  %5 = add i64 %4, 1
  %6 = sub i64 %5, %2
; Location: REPL[1]:6
  ret i64 %6
}

It may take a minute to parse through the LLVM IR there, but you should be able to see that it's just extracting the endpoints of the UnitRange (getelementptr and load), subtracting them from each other (sub) and adding one to compute the sum without a single loop.

In this case, though, it works against Julia: paths(10) has a ridiculously complicated type! You're iteratively wrapping that one generator in filters and flattens and yet more generators. It becomes so complicated, in fact, that Julia just gives up trying to figure out with it and decides to live with the dynamic behavior. And at this point, it no longer has an inherent advantage over Python — in fact specializing on so many different types as it recursively walks through the object would be a distinct handicap. You can see this in action by looking at @code_warntype start(1 for i in paths(10)).

My rule of thumb for Julia's performance is that type-stable, devectorized code that avoids allocations is typically within an factor of 2 of C, and dynamic, unstable, or vectorized code is within an order of magnitude of Python/MATLAB/other higher level languages. Often it's a bit slower simply because the other higher level languages have pushed very hard to optimize their case, whereas the majority of Julia's optimizations have been focused on the type-stable side of things. This deeply nested construct puts you squarely in the dynamic camp.

So are Julia's generators terribly slow? Not inherently so; it's just when they become so deeply nested like this that you hit this bad case.

like image 197
mbauman Avatar answered Sep 26 '22 01:09

mbauman


const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py];

function expandto(n, path, targetlen)
    length(path) >= targetlen && return n+1
    for loc in NEIG[path[end]]
        loc in path && continue
        n = expandto(n, (path..., loc), targetlen)
    end
    n
end

function npaths(targetlen)
    n = 0
    for i = 1:16
        path = (i,)
        n = expandto(n, path, targetlen)
    end
    n
end

Benchmark (after executing once for JIT-compilation):

julia> @time npaths(10)
  0.069531 seconds (5 allocations: 176 bytes)
905776

which is considerably faster.

like image 41
tholy Avatar answered Sep 26 '22 01:09

tholy