Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fast binary data conversion in Python

Tags:

python

What is the fastest method for converting a binary data string to a numeric value in Python?

I am using struct.unpack_from(), but am hitting a performance limit.

Context: an incoming stream is mixed binary and ASCII data. The ASCII data conversion is done in C though ctypes. Implementing the unpacking in C through ctypes yielded similar performance to unpack. My guess is the call overhead was too much of a factor. I was hoping to find a native C-like coercion method (however un-Pythonic). Most likely all of this code will need to move to C.

The stream is in network byte order (big-endian) and the machine is little-endian. An example conversion would be:

import struct
network_stream = struct.pack('>I', 0x12345678)
(converted_int,) = struct.unpack_from('>I', network_stream, 0) 

I am less concerned about handling the stream format, than the general case of binary conversion, and if there is even an alternative to unpack. For example, socket.ntohl() requires an int, and int() won't convert a binary data string.

Thanks for your suggestions!

like image 422
CNK Avatar asked Nov 16 '11 00:11

CNK


2 Answers

The speed problem probably comes not in the implementation of struct.unpack_from() itself, but in everything else Python needs to do—dictionary lookups, create objects, call functions, and other tasks. You can speed things up ever so slightly by eliminating one of these dictionary lookups by importing unpack_from directly rather than getting it from the struct module each time:

$ python -m timeit -s "import struct; network_stream = struct.pack('>I', 0x12345678)" "(converted_int,) = struct.unpack_from('>I', network_stream, 0)" 
1000000 loops, best of 3: 0.277 usec per loop

$ python -m timeit -s "import struct; from struct import unpack_from; network_stream = struct.pack('>I', 0x12345678)" "(converted_int,) = unpack_from('>I', network_stream, 0)"
1000000 loops, best of 3: 0.258 usec per loop

However, if there needs to be a lot of parsing logic that necessitates unpacking one number at a time, and will keep you from unpacking a whole array of data in bulk, it doesn't matter what you call to do it for you. You are probably going to need to do this whole inner loop in a language with less overhead, such as C.

like image 109
Michael Hoffman Avatar answered Oct 23 '22 19:10

Michael Hoffman


Based on my experience, you are correct that the code will need to be moved to C. As you discovered the performance for the various tools for binary conversion (struct and ctypes for example) have roughly similar performance.

Cython is the easiest way to get generate a C extension for Python.

Another easy approach is to abandon CPython entirely in favor of pypy which can generate high quality, low-level code using its tracing JIT.

A more challenging but more direct approach is to write a plain C extension. This isn't fun but it isn't difficult.

like image 20
Raymond Hettinger Avatar answered Oct 23 '22 21:10

Raymond Hettinger