I often need to sort large numpy arrays (few billion elements), which became a bottleneck of my code. I am looking for a way to parallelize it.
Are there any parallel implementations for the ndarray.sort()
function? Numexpr module provides parallel implementation for most math operations on numpy arrays, but lacks sorting capabilities.
Maybe, it is possible to make a simple wrapper around a C++ implementation of parallel sorting, and use it through Cython?
I ended up wrapping GCC parallel sort. Here is the code:
parallelSort.pyx
# cython: wraparound = False
# cython: boundscheck = False
import numpy as np
cimport numpy as np
import cython
cimport cython
ctypedef fused real:
cython.char
cython.uchar
cython.short
cython.ushort
cython.int
cython.uint
cython.long
cython.ulong
cython.longlong
cython.ulonglong
cython.float
cython.double
cdef extern from "<parallel/algorithm>" namespace "__gnu_parallel":
cdef void sort[T](T first, T last) nogil
def numpyParallelSort(real[:] a):
"In-place parallel sort for numpy types"
sort(&a[0], &a[a.shape[0]])
Extra compiler args: -fopenmp (compile) and -lgomp (linking)
This makefile will do it:
all:
cython --cplus parallelSort.pyx
g++ -g -march=native -Ofast -fpic -c parallelSort.cpp -o parallelSort.o -fopenmp `python-config --includes`
g++ -g -march=native -Ofast -shared -o parallelSort.so parallelSort.o `python-config --libs` -lgomp
clean:
rm -f parallelSort.cpp *.o *.so
And this shows that it works:
from parallelSort import numpyParallelSort
import numpy as np
a = np.random.random(100000000)
numpyParallelSort(a)
print a[:10]
edit: fixed bug noticed in the comment below
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With