Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cython: How to sort the contents of one vector by another vector?

Tags:

c++

python

cython

I'm trying to sort two C++ vectors in Cython, one by the contents of itself and one by the contents of the first vector.

cimport cython
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort

def function():

    cdef vector[np.npy_float] distances
    cdef vector[np.npy_intp] indices

    d = [9., 8., 3., 2., 3.]
    for i in range(len(d)):
        distances.push_back(d[i])
        indices.push_back(i)

    stdsort(distances.begin(), distances.end())
    // distances = [2.0, 3.0, 3.0, 8.0, 9.0]
    // Sort indices by distances?

    return distances, indices

I know in pure C++ you can easily do this with an object that houses both the distance and the index and having a custom sorting function for that object, but what's an easy way to do this in Cython?

like image 483
narcissa Avatar asked Oct 16 '22 01:10

narcissa


1 Answers

For getting sorting indices in C++, one usually would not create a new vector with index-value pairs, but choose a comparator, which would achieve the same without actually copying memory.

Translating this to Cython would look as follows:

%%cython -+ -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    """
    #include <algorithm>
    #include <vector>
    void sort_via_score(std::vector<int>& indices, const std::vector<double>& scores){
        std::sort(indices.begin(), indices.end(),
                  [&scores](int i, int j){return scores.at(i)<scores.at(j);}
                 );
    }
    """
    void sort_via_score(vector[int]& indices, vector[double]& scores)
    
def sort_indices(lst):
    cdef vector[double] scores = lst
    cdef vector[int] indices = range(len(lst))
    sort_via_score(indices, scores)
    return indices

The function sort_indices is a wrapper which allows us to check the implementation quickly:

sort_indices([5,4,3,2,1])
# [4, 3, 2, 1, 0] as expected

sort_via_score works similar to the following one-liner in Python:

def sort_indices_py(scores):
    return sorted(range(len(scores)), key=lambda x: scores[x])

the scores-vector is used in the closure to look-up the score of the index . There are no new objects created which would put index and its score together in memory - they are combined by logic of the key-function alone.


The solution above uses verbatim C-code, because it is so much easier to write C++ code with C++ than with Cython.

If one really wants to stick to "pure" Cython (I don't recomend), so it is possible to emulate the C++-closures with the following code:

%%cython -+
from libcpp.vector cimport vector
from libcpp.algorithm cimport sort as stdsort

cdef vector[double]* vec
cdef bint comp_fun(int i, int j):
    return vec.at(i)<vec.at(j)

def sort_indices2(lst):
    cdef vector[double] scores = lst
    cdef vector[int] indices = range(len(lst))
    global vec
    vec = &scores # "global closure"
    stdsort(indices.begin(), indices.end(), comp_fun)
    return indices
like image 148
ead Avatar answered Oct 22 '22 09:10

ead