Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A way to pass milions of items in python to C program many times in rapid succesion

I've wrote a python script that need to pass millions of items to a C program and receive its output many times in a short period (pass from 1 up to 10 millions of vertices data (integer index and 2 float coords) rapidly 500 times, and each time the python script call the C program, i need to store the returned values in variables). I already implemented a way reading and writing text and or binary files, but it's slow and not smart(why write files to hdd while you don't need to store the data after the python script terminates?). I tried to use pipes, but for large data they gave me errors... So, by now i think the best way can be using the ability of ctypes to load functions in .dll Since i've never created a dll, i would like to know how to set it up (i know many ide have a template for this, but my wxdev-c++ crashes when i try to open it. Right now i'm downloading Code::Blocks )

Can you tell me if the solution i'm starting to implement is right, or if there is a better solution? The 2 functions i need to call in python are these

void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)
{
    int i;
    *lower=list[0];
    *highter=list[1];
    for(i=0;i<len;i++)
    {
        if ((list[i].x<=lower->x) && (list[i].y<=lower->y))
            *lower=list[i];
        else
        {
            if ((list[i].x>=highter->x) && (list[i].y>=highter->y))
                *highter=list[i];
        }
    }
}

and

vertex *square_list_of_vertex(vertex *list,int len,vertex start, float size)
{
    int i=0,a=0;
    unsigned int *num;
    num=(int*)malloc(sizeof(unsigned int)*len);
    if (num==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //controlls which points are in the right position and adds their index from the main list in another list
    for(i=0;i<len;i++)
    {
        if ((list[i].x-start.x)<size && (list[i].y-start.y<size))
        {
            if (list[i].y-start.y>-size/100)
            {
                num[a]=i;
                a++;//len of the list to return
            }
        }
    }

    //create the list with the right vertices
    vertex *retlist;
    retlist=(vertex*)malloc(sizeof(vertex)*(a+1));
    if (retlist==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //the first index is used only as an info container
    vertex infos;
    infos.index=a+1;
    retlist[0]=infos;

    //set the value for the return pointer
    for(i=1;i<=a;i++)
    {
        retlist[i]=list[num[i-1]];
    }

    return retlist;
}

EDIT: forgot to post the type defintion of vertex

typedef struct{
    int index;
    float x,y;
} vertex;

EDIT2: I'll redistribute the code, so i prefer not to use external modules in python and external programs in C. Alsa i want try to keep the code cross platform. The script is an addon for a 3D app, so the less it uses external "stuff" the better it is.

like image 385
Makers_F Avatar asked Feb 14 '11 22:02

Makers_F


2 Answers

Using ctypes or Cython to wrap your C functions is definitely the way to go. That way, you won't even need to copy the data between the C and Python code -- both the C and the Python part run within the same process and access the same data. Let's stick with ctypes, since this is what you suggested. Additionally, using NumPy will make this a lot more comfortable.

I infer your vertex type looks like this:

typedef struct
{
    int index;
    float x, y;
} vertex;

To have these vertices in a NumPy array, you can define a record "dtype" for it:

vertex_dtype = [('index', 'i'), ('x', 'f'), ('y', 'f')]

Also define this type as a ctypes structure:

class Vertex(ctypes.Structure):
    _fields_ = [("index", ctypes.c_int),
                ("x", ctypes.c_float),
                ("y", ctypes.c_float)]

Now, the ctypes prototype for your function find_vertex() would look like this:

from numpy.ctypeslib import ndpointer
lib = ctypes.CDLL(...)
lib.find_vertex.argtypes = [ndpointer(dtype=vertex_dtype, flags="C_CONTIGUOUS"),
                            ctypes.c_int,
                            ctypes.POINTER(Vertex),
                            ctypes.POINTER(Vertex)]
lib.find_vertex.restypes = None

To call this function, create a NumPy array of vertices

vertices = numpy.empty(1000, dtype=vertex_dtype)

and two structures for the return values

lower = Vertex()
higher = Vertex()

and finally call your function:

lib.find_vertex(vertices, len(vertices), lower, higher)

NumPy and ctypes will take care of passing the pointer to the beginning of the data of vertices to your C function -- no copying required.

Probably, you will have to read a bit of documentation on ctypes and NumPy, but I hope this answer helps you to get started with it.

like image 69
Sven Marnach Avatar answered Sep 23 '22 23:09

Sven Marnach


It seems like what you really want is to turn your C program into a Python module. Here is a tutorial that will get you started.

like image 38
nmichaels Avatar answered Sep 24 '22 23:09

nmichaels