I'm writing a python script to search for several different byte strings within a large binary file and so far it works well but, I have hit a bit of an anomoly. Here is what I have done so far:
for i in range(0, fileSizeBytes):
data.seek(readOffsetIndex, 0) # Change the file index to last search.
print('Starting Read at DEC: %s' % str(readOffsetIndex))
print('Starting Read at HEX: %s' % str(hex(readOffsetIndex)))
byte = data.read() # Read the file starting at the new index
search = byte.find(b'\x00\x00\x00\xbb') # Search for this string of bytes
if search:
byteOffset = (byteOffset + (busWidth+4))
startOffset = str(hex(byteOffset-4))
readOffsetIndex = byteOffset
print('String Found Starting at: ' + startOffset)
print('READ SET TO: %s' % str(readOffsetIndex))
print('READ SET TO: %s' % str(hex(readOffsetIndex)))
print('---------------------------------------------------')
csvWriter.writerow(['Bus Width', str(startOffset), str(hex(readOffsetIndex)), grabData(byteOffset-4)])
if (readOffsetIndex >= fileSizeBytes): # Check bounds of file size to kill loop
csvFile.close()
break
The only query it is attempying to find is: search = byte.find(b'\x00\x00\x00\xbb'). When I analyze the data, the few records are perfect but when I hit search location 0x189da6b, it goes crazy on me. See the image below for the data output:
It's like is just stops looking for the specific string and starts doing its own thing... Any ideas as to why this is happening? The CSV has a total of 88,900 rows of which, about 90 are valid search strings and the rest are the jibbereist you see in the data.
UPDATE #1:
I have found a better way to interate through a Binary file and locate all occurances of a byte string along with the offset of said byte string. Below is a method that does just that:
from bitstring import ConstBitStream
def parse(register_name,byte_data):
fileSizeBytes = os.path.getsize(bin_file)
fileSizeMegaBytes = GetFileSize(os.path.getsize(bin_file))
data = open(bin_file, 'rb')
s = ConstBitStream(filename=bin_file)
occurances = s.findall(byte_data, bytealigned=True)
occurances = list(occurances)
totalOccurances = len(occurances)
byteOffset = 0 # True start of Byte string
for i in range(0, len(occurances)):
occuranceOffset = (hex(int(occurances[i]/8)))
s0f0, length, bitdepth, height, width = s.readlist('hex:16, uint:16, uint:8, 2*uint:16')
s.bitpos = occurances[i]
data = s.read('hex:32')
print('Address: ' + str(occuranceOffset) + ' Data: ' + str(data))
csvWriter.writerow([register_name, str(occuranceOffset), str(data)])
From the documentation
bytes.find(sub[, start[, end]])
Return the lowest index in the data where the subsequence sub is found... Return -1 if sub is not found.
When the find can't locate the substring it will return -1 but in the if
statement the -1
is converted into true
and your code is executed. Rewrite the condition to the if search != -1:
and it should start working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With