Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass FILE * into function from Python / ctypes

Tags:

python

c

ctypes

I have a library function (written in C) that generates text by writing the output to FILE *. I want to wrap this in Python (2.7.x) with code that creates a temp file or pipe, passes it into the function, reads the result from the file, and returns it as a Python string.

Here's a simplified example to illustrate what I'm after:

/* Library function */
void write_numbers(FILE * f, int arg1, int arg2)
{
   fprintf(f, "%d %d\n", arg1, arg2);
}

Python wrapper:

from ctypes import *
mylib = CDLL('mylib.so')


def write_numbers( a, b ):
   rd, wr = os.pipe()

   write_fp = MAGIC_HERE(wr)
   mylib.write_numbers(write_fp, a, b)
   os.close(wr)

   read_file = os.fdopen(rd)
   res = read_file.read()
   read_file.close()

   return res

#Should result in '1 2\n' being printed.
print write_numbers(1,2)

I'm wondering what my best bet is for MAGIC_HERE().

I'm tempted to just use ctypes and create a libc.fdopen() wrapper that returns a Python c_void_t, then pass that into the library function. I'm seems like that should be safe in theory--just wondering if there are issues with that approach or an existing Python-ism to solve this problem.

Also, this will go in a long-running process (lets just assume "forever"), so any leaked file descriptors are going to be problematic.

like image 524
Brian McFarland Avatar asked Oct 23 '15 20:10

Brian McFarland


1 Answers

First, do note that FILE* is an stdio-specific entity. It doesn't exist at system level. The things that exist at system level are descriptors (retrieved with file.fileno()) in UNIX (os.pipe() returns plain descriptors already) and handles (retrieved with msvcrt.get_osfhandle()) in Windows. Thus it's a poor choice as an inter-library exchange format if there can be more than one C runtime in action. You'll be in trouble if your library is compiled against another C runtime than your copy of Python: 1) binary layouts of the structure may differ (e.g. due to alignment or additional members for debugging purposes or even different type sizes); 2) in Windows, file descriptors that the structure links to are C-specific entities as well, and their table is maintained by a C runtime internally1.

Moreover, in Python 3, I/O was overhauled in order to untangle it from stdio. So, FILE* is alien to that Python flavor (and likely, most non-C flavors, too).

Now, what you need is to

  • somehow guess which C runtime you need, and
  • call its fdopen() (or equivalent).

(One of Python's mottoes is "make the right thing easy and the wrong thing hard", after all)


The cleanest method is to use the precise instance that the library is linked to (do pray that it's linked with it dynamically or there'll be no exported symbol to call)

For the 1st item, I couldn't find any Python modules that can analyze loaded dynamic modules' metadata to find out which DLLs/so's it have been linked with (just a name or even name+version isn't enough, you know, due to possible multiple instances of the library on the system). Though it's definitely possible since the information about its format is widely available.

For the 2nd item, it's a trivial ctypes.cdll('path').fdopen (_fdopen for MSVCRT).


Second, you can do a small helper module that would be compiled against the same (or guaranteed compatible) runtime as the library and would do the conversion from the aforementioned descriptor/handle for you. This is effectively a workaround to editing the library proper.


Finally, there's the simplest (and the dirtiest) method using Python's C runtime instance (so all the above warnings apply in full) through Python C API available via ctypes.pythonapi. It takes advantage of

  • the fact that Python 2's file-like objects are wrappers over stdio's FILE* (Python 3's are not)
  • PyFile_AsFile API that returns the wrapped FILE* (note that it's missing from Python 3)
    • for a standalone fd, you need to construct a file-like object first (so that there would be a FILE* to return ;) )
  • the fact that id() of an object is its memory address (CPython-specific)2

    >>> open("test.txt")
    <open file 'test.txt', mode 'r' at 0x017F8F40>
    >>> f=_
    >>> f.fileno()
    3
    >>> ctypes.pythonapi
    <PyDLL 'python dll', handle 1e000000 at 12808b0>
    >>> api=_
    >>> api.PyFile_AsFile
    <_FuncPtr object at 0x018557B0>
    >>> api.PyFile_AsFile.restype=ctypes.c_void_p   #as per ctypes docs,
                                             # pythonapi assumes all fns
                                             # to return int by default
    >>> api.PyFile_AsFile.argtypes=(ctypes.c_void_p,) # as of 2.7.10, long integers are
                    #silently truncated to ints, see http://bugs.python.org/issue24747
    >>> api.PyFile_AsFile(id(f))
    2019259400
    

Do keep in mind that with fds and C pointers, you need to ensure proper object lifetimes by hand!

  • file-like objects returned by os.fdopen() do close the descriptor on .close()
    • so duplicate descriptors with os.dup() if you need them after a file object is closed/garbage collected
  • while working with the C structure, adjust the corresponding object's reference count with PyFile_IncUseCount()/PyFile_DecUseCount().
  • ensure no other I/O on the descriptors/file objects since it would screw up the data (e.g. ever since calling iter(f)/for l in f, internal caching is done that's independent from stdio's caching)
like image 196
ivan_pozdeev Avatar answered Nov 08 '22 11:11

ivan_pozdeev