Yesterday I had to parse a very simple binary data file - the rule is, look for two bytes in a row that are both 0xAA, then the next byte will be a length byte, then skip 9 bytes and output the given amount of data from there. Repeat to the end of the file.
My solution did work, and was very quick to put together (even though I am a C programmer at heart, I still think it was quicker for me to write this in Python than it would have been in C) - BUT, it is clearly not at all Pythonic and it reads like a C program (and not a very good one at that!)
What would be a better / more Pythonic approach to this? Is a simple FSM like this even still the right choice in Python?
My solution:
#! /usr/bin/python
import sys
f = open(sys.argv[1], "rb")
state = 0
if f:
for byte in f.read():
a = ord(byte)
if state == 0:
if a == 0xAA:
state = 1
elif state == 1:
if a == 0xAA:
state = 2
else:
state = 0
elif state == 2:
count = a;
skip = 9
state = 3
elif state == 3:
skip = skip -1
if skip == 0:
state = 4
elif state == 4:
print "%02x" %a
count = count -1
if count == 0:
state = 0
print "\r\n"
django-fsm adds declarative states management for django models. Instead of adding some state field to a django model, and manage it. values by hand, you could use FSMState field and mark model methods. with the `transition` decorator. Your method will contain the side-effects.
You could give your states constant names instead of using 0, 1, 2, etc. for improved readability.
You could use a dictionary to map (current_state, input) -> (next_state)
, but that doesn't really let you do any additional processing during the transitions. Unless you include some "transition function" too to do extra processing.
Or you could do a non-FSM approach. I think this will work as long as 0xAA 0xAA
only appears when it indicates a "start" (doesn't appear in data).
with open(sys.argv[1], 'rb') as f:
contents = f.read()
for chunk in contents.split('\xaa\xaa')[1:]:
length = ord(chunk[0])
data = chunk[10:10+length]
print data
If it does appear in data, you can instead use string.find('\xaa\xaa', start)
to scan through the string, setting the start
argument to begin looking where the last data block ended. Repeat until it returns -1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With