Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia performance compared to Python+Numba LLVM/JIT-compiled code

The performance benchmarks for Julia I have seen so far, such as at http://julialang.org/, compare Julia to pure Python or Python+NumPy. Unlike NumPy, SciPy uses the BLAS and LAPACK libraries, where we get an optimum multi-threaded SIMD implementation. If we assume that Julia and Python performance are the same when calling BLAS and LAPACK functions (under the hood), how does Julia performance compare to CPython when using Numba or NumbaPro for code that doesn't call BLAS or LAPACK functions?

One thing I notice is that Julia is using LLVM v3.3, while Numba uses llvmlite, which is built on LLVM v3.5. Does Julia's old LLVM prevent an optimum SIMD implementation on newer architectures, such as Intel Haswell (AVX2 instructions)?

I am interested in performance comparisons for both spaghetti code and small DSP loops to handle very large vectors. The latter is more efficiently handled by the CPU than the GPU for me due to the overhead of moving data in and out of the GPU device memory. I am only interested in performance on a single Intel Core-i7 CPU, so cluster performance is not important to me. Of particular interest to me is the ease and success with creating parallelized implementations of DSP functions.

A second part of this question is a comparison of Numba to NumbaPro (ignoring the MKL BLAS). Is NumbaPro's target="parallel" really needed, given the new nogil argument for the @jit decorator in Numba?

like image 988
hiccup Avatar asked Apr 09 '15 20:04

hiccup


People also ask

Is Julia faster than PyPy?

Julia seems to be faster than PyPy.

Is Numba as fast as C?

The machine code generated by Numba is as fast as languages like C, C++, and Fortran without having to code in those languages. Numba works really well with Numpy arrays, which is one of the reasons why it is used more and more in scientific computing.

Is Julia faster than Numba?

Numba is 10X faster than pure Python for the micro-benchmark of a simple quadrature rule. However, Julia is still more than 3X faster than Numba, in part due to SIMD optimizations enabled by LoopVectorization.

Should I use Cython or Numba?

Cython is easier to distribute than Numba, which makes it a better option for user facing libraries. It's the preferred option for most of the scientific Python stack, including NumPy, SciPy, pandas and Scikit-Learn. In contrast, there are very few libraries that use Numba.


1 Answers

This is a very broad question. Regarding the benchmark requests, you may be best off running a few small benchmarks yourself matching your own needs. To answer one of the questions:

One thing I notice is that Julia is using LLVM v3.3, while Numba uses llvmlite, which is built on LLVM v3.5. Does Julia's old LLVM prevent an optimum SIMD implementation on newer architectures, such as Intel Haswell (AVX2 instructions)?

[2017/01+: The information below no longer applies to current Julia releases]

Julia does turn off avx2 with LLVM 3.3 because there were some deep bugs on Haswell.

Julia is built with LLVM 3.3 for the current releases and nightlies, but you can build with 3.5, 3.6, and usually svn trunk (if we haven't yet updated for some API change on a given day, please file an issue). To do so, set LLVM_VER=svn (for example) in Make.user and then proceed to follow the build instructions.

like image 110
Isaiah Norton Avatar answered Sep 28 '22 13:09

Isaiah Norton