I'm looking for a clean and simple way to read a null-terminated C string from a file or file-like object in Python. In a way that doesn't consume more input from the file than it needs, or pushes it back onto whatever file/buffer it works with such that other code can read the data immediately after a null-terminated string.
I've seen a bit of rather ugly code to do it, but not much that I'd like to use.
universal newlines support only works for open()
ed files, not StringIO objects etc, and doesn't look like it handles unconventional newlines. Also, if it did work it'd result in strings with \n
appended, which is undesirable.
struct doesn't look like it supports reading arbitrary-length C strings at all, requiring a length as part of the format.
ctypes has c_buffer
, which can be constructed from a byte string and will return the first null terminated string as its value
. Again, this requires determining how much must be read in advance, and it doesn't differentiate between null-terminated and unterminated strings. The same is true of c_char_p
. So it doesn't seem to help much, since you already have to know you've read enough of the string and have to handle buffer splits.
The usual way to do this in C is read chunks into a buffer, copying and resizing the buffer if needed, then check if the newest chunk read contains a null byte. If it does, return everything up to the null byte and either realign the buffer or if you're being fancy, keep on reading and use it as a ring buffer. (This only works if you can hand the excess data read back to the caller, or if your platform's ungetc
lets to push a lot back onto the file, of course.)
Is it necessary to spell out similar code in Python? I was surprised not to find anything canned in io
, ctypes
or struct
.
file objects don't seem to have a way to push back onto their buffer, like ungetc
, and neither do buffered I/O streams in the io
module.
I feel like I must be missing the obvious here. I'd really rather avoid byte-by-byte reading:
def readcstr(f):
buf = bytearray()
while True:
b = f.read(1)
if b is None or b == '\0':
return str(buf)
else:
buf.append(b)
but right now that's what I'm doing.
Incredibly mild improvement on what you have (mostly in that it uses more built-ins that, in CPython, are implemented in C, which usually runs faster):
import functools
import itertools
def readcstr(f):
toeof = iter(functools.partial(f.read, 1), '')
return ''.join(itertools.takewhile('\0'.__ne__, toeof))
This is relatively ugly (and sensitive to the type of the file object; it won't work with file objects that return unicode
), but pushes all the work to the C layer. The two arg iter ensures you stop if the file is exhausted, while itertools.takewhile
looks for (and consumes) the NUL
terminator but no more; ''.join
then combines the bytes read into a single return value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With