Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare Speed Python3 vs Julia [closed]

I'm starting to write a program doing nonlinear beam-calculations. I chose Python because of it's Matlab like code and I'm currently doing speed tests (to be sure if python is the right language to do fast numeric calculations) and try to get familiar with python3. I tried an algorithm calculating the sum of 1/t^2 from t=1 to n (from the book Julia High Perfromance) to compare the speed of python3 with julia. Now I have some questions:

1) In my calculations Julia is not as fast as expected? Julia ist JIT compiled. It should be very fast. Why ist python faster?

2) Why are loops in python so slow?

3) Why is the python sum method slower than the numpy.sum method

4) Why ist the sum function of python geting a slightly different solution than the numpy.sum function?

I hope you can help me.

The code:

# Benchmark Python vs Julia
# (from Julia High Performance page 7 from Avik Sengupta)

import scipy as sp
import time

# Sum of 1/t^2 from t = 1 to n by using loops:
# --------------------------------------------
def pisum(w,n):
    u_sum = 0
    for vi in range(w):
        u_sum = 0
        for vj in range(1,n+1,1):
            u_sum += 1.0/(vj*vj)
    return u_sum

# Sum of 1/t^2 from t = 1 to n by using scipy functions:
# ------------------------------------------------------
def pisum1(w,n):
    for vi in range(w):
        vj = sp.arange(1,n+1,1)
        vj = sp.multiply(vj,vj)
        u_sum_sp = (sp.divide(sp.ones(n),vj)).sum()
    return u_sum_sp

# Sum of 1/t^2 from t = 1 to n by using scipy functions & calculating 
# the sum via pure python:
# -------------------------------------------------------------------
def pisum2(w,n):
    for vi in range(w):
        vj = sp.arange(1,n+1,1)
        vj = sp.multiply(vj,vj)
        u_sum_py = sum(sp.divide(sp.ones(n),vj))
    return u_sum_py

# Benchmarking the methods :
# ==========================   

w = 500 
n = 10000

# 1) Loops:
# ---------    
ta = time.clock()
u_sum_loops = pisum(w,n)
eltime_loops = time.clock() - ta

# 2) scipy:
# ---------
ta = time.clock()
u_sum_sp = pisum1(w,n)
eltime_sp= time.clock() - ta


# 3) scipy & sum via python:
# --------------------------
ta = time.clock()
u_sum_py = pisum2(w,n)
eltime_py= time.clock() - ta

# Julia with loops:
# -----------------
eltime_ju_loops = 0.150857295
u_sum_ju = 1.6448340718480652

row_format = '{:<35} {:<5} {:<5} {:<} {:<}'
print("Overview calculation time:")
print("-"*50)
print(row_format.format("elapsed time using loops:","%.5f" % eltime_loops,"sec. (", "%.2f"% (eltime_loops/eltime_ju_loops),"*time Julia)"))
print(row_format.format("elapsed time using scipy:","%.5f" % eltime_sp,"sec. (", "%.2f"% (eltime_sp/eltime_ju_loops),"*time Julia)"))
print(row_format.format("elapsed time using python:","%.5f" % eltime_py,"sec. (","%.2f"% (eltime_py/eltime_ju_loops),"*time Julia)"))
print(row_format.format("elapsed time using Julia and loops:","%.5f" % eltime_ju_loops,"sec.","","\n"))

line1 = "sum loops:",u_sum_loops
line2 = "sum scipy:",u_sum_sp
line3 = "sum scipy and sum via python:",u_sum_py
line4 = "sum_julia:",u_sum_ju
row_format = '{:<29} {:<18}'
print("Overview Sum:")
print("-"*50)
print(row_format.format(*line1))
print(row_format.format(*line2))
print(row_format.format(*line3))
print(row_format.format(*line4))

# Julia Code:
# ===========

# function pisum(w,n)
#     u_sum = 0;
#     for vi = 1:w
#         u_sum = 0;
#         for vj = 1:n
#             u_sum += 1.0/(vj*vj);
#         end
#     end
#     u_sum
# end
# 
# tic()
# u_sum = pisum(500,10000)
# eltime = toc()
like image 267
usr_g Avatar asked Jun 16 '17 13:06

usr_g


1 Answers

1) In my calculations Julia is not as fast as expected? Julia ist JIT compiled. It should be very fast. Why ist python faster?

Your function was not type-stable. Julia isn't fast because of its JIT compiler: it's fast because of its type system. Its type system is designed to use multiple dispatch on type-stable functions (functions where the output types are a function of the input types) to fully deduce the types at every stage of the code, allowing for its functions to be essentially statically compiled. By doing so, functions which are built by chaining together type-stable functions can themselves by type-stable, and compiled to be 1x in comparison to the C/Fortran code you would've wanted to write (because this is enough information for all of the relevant compiler optimizations). This is explained in more detail at this blog post.

So the type-stable code is:

function pisum(w,n)
    u_sum = 0.0;
    for vi = 1:w
        u_sum = 0.0;
        for vj = 1:n
            u_sum += 1.0/(vj*vj);
        end
    end
    u_sum
end
@time u_sum = pisum(500,10000)

That for me runs in around 0.02 seconds, which is about 2x faster than the SciPy example and 50x-100x faster than the other implementations on my computer.

Note that you can check for type-stability by calling @code_warntype pisum(500,10000) which will upper-case and red-highlight lines whose return type are not type-stable. This will have shown you that your version has u_sum start as an Int and then turn into a Float64. Think about if you were to have written a statically-compiled function like that (C/Fortran): you couldn't compile that because the type of u_sum would be unknown.

2) Why are loops in python so slow?

Because the variables in Python are dynamic. Every time a variable it hit, the interpreter needs to find out what the variable is, find the right compiled function for it, and handle the returned value (maybe doing some conversions to hide the real underlying changes that may have occurred). It's essentially a not-type-stable Julia function.

3) Why is the python sum method slower than the numpy.sum method

NumPy is written to assume that the array is an array of floating point numbers. Python arrays (lists) are generally anything. The Julia notation for this is Vector{Float64} vs Vector{Any}. In the first one, you can know exactly what the type is, eliminating type checks, conversions, etc. You can know that the size of each type in memory is the same, and since they are all the same, you can inline them into the vector, instead of having the memory be a bunch of pointers to the real objects (and the pointer indirection disables a lot of optimizations).

4) Why ist the sum function of python geting a slightly different solution than the numpy.sum function?

Floating point is weird and is not associative. There may be an optimization in SciPy going on that changes the order of some computation, probably some kind of loop unrolling

Moral of the Story

Code becomes fast by optimizing it. Code can optimize better if the compiler has more information because then it can make better assumptions and remove lots of unnecessary checks and indirections. Python, being fully dynamic, can give the interpreter/runtime almost no information, forcing it through the least optimized paths. You can make this better by making it call functions written in C for specific argument types because then the compiler can optimize the C code (this is what SciPy does).

Julia is designed to allow you to give the compiler the full information of a statically-compiled language, but in a mostly dynamic language. This is due to the type system and multiple dispatch. Thus it is able to match the compiled code of C/Fortran in these cases, getting the full speed without the overhead of having to call FFI at a runtime. If you write code where the compiler cannot have any information, for example making the output types random, then the compiler cannot optimize and the code essentially becomes dynamic and as slow as Python. This is nice though, because C will just segfault in these cases...

like image 160
Chris Rackauckas Avatar answered Oct 24 '22 01:10

Chris Rackauckas