I'm having an issue parsing data after reading a file. What I'm doing is reading a binary file in and need to create a list of attributes from the read file all of the data in the file is terminated with a null byte. What I'm trying to do is find every instance of a null byte terminated attribute.
Essentially taking a string like
Health\x00experience\x00charactername\x00
and storing it in a list.
The real issue is I need to keep the null bytes in tact, I just need to be able to find each instance of a null byte and store the data that precedes it.
Python doesn't treat NUL bytes as anything special; they're no different from spaces or commas. So, split()
works fine:
>>> my_string = "Health\x00experience\x00charactername\x00"
>>> my_string.split('\x00')
['Health', 'experience', 'charactername', '']
Note that split
is treating \x00
as a separator, not a terminator, so we get an extra empty string at the end. If that's a problem, you can just slice it off:
>>> my_string.split('\x00')[:-1]
['Health', 'experience', 'charactername']
While it boils down to using split('\x00')
a convenience wrapper might be nice.
def readlines(f, bufsize):
buf = ""
data = True
while data:
data = f.read(bufsize)
buf += data
lines = buf.split('\x00')
buf = lines.pop()
for line in lines:
yield line + '\x00'
yield buf + '\x00'
then you can do something like
with open('myfile', 'rb') as f:
mylist = [item for item in readlines(f, 524288)]
This has the added benefit of not needing to load the entire contents into memory before splitting the text.
To check if string has NULL byte, simply use in
operator, for example:
if b'\x00' in data:
To find the position of it, use find()
which would return the lowest index in the string where substring sub is found. Then use optional arguments start and end for slice notation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With