Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for all Occurance of Byte Strings in Binary File

I'm writing a python script to search for several different byte strings within a large binary file and so far it works well but, I have hit a bit of an anomoly. Here is what I have done so far:

for i in range(0, fileSizeBytes):
        data.seek(readOffsetIndex, 0)                                   # Change the file index to last search. 
        print('Starting Read at DEC: %s' % str(readOffsetIndex))
        print('Starting Read at HEX: %s' % str(hex(readOffsetIndex)))
        byte = data.read()                                              # Read the file starting at the new index
        search = byte.find(b'\x00\x00\x00\xbb')                       # Search for this string of bytes

        if search:
            byteOffset = (byteOffset + (busWidth+4))
            startOffset = str(hex(byteOffset-4))
            readOffsetIndex = byteOffset
            print('String Found Starting at: ' + startOffset)
            print('READ SET TO: %s' % str(readOffsetIndex))
            print('READ SET TO: %s' % str(hex(readOffsetIndex)))
            print('---------------------------------------------------')
            csvWriter.writerow(['Bus Width', str(startOffset), str(hex(readOffsetIndex)), grabData(byteOffset-4)])

        if (readOffsetIndex >= fileSizeBytes):                          # Check bounds of file size to kill loop
            csvFile.close()
            break

The only query it is attempying to find is: search = byte.find(b'\x00\x00\x00\xbb'). When I analyze the data, the few records are perfect but when I hit search location 0x189da6b, it goes crazy on me. See the image below for the data output:

enter image description here

It's like is just stops looking for the specific string and starts doing its own thing... Any ideas as to why this is happening? The CSV has a total of 88,900 rows of which, about 90 are valid search strings and the rest are the jibbereist you see in the data.


UPDATE #1:

I have found a better way to interate through a Binary file and locate all occurances of a byte string along with the offset of said byte string. Below is a method that does just that:

from bitstring import ConstBitStream

def parse(register_name,byte_data):
    fileSizeBytes = os.path.getsize(bin_file)
    fileSizeMegaBytes = GetFileSize(os.path.getsize(bin_file))
    data = open(bin_file, 'rb')

    s = ConstBitStream(filename=bin_file)
    occurances = s.findall(byte_data, bytealigned=True)
    occurances = list(occurances)
    totalOccurances = len(occurances)
    byteOffset = 0                                                      # True start of Byte string 

    for i in range(0, len(occurances)):
        occuranceOffset = (hex(int(occurances[i]/8)))
        s0f0, length, bitdepth, height, width = s.readlist('hex:16, uint:16, uint:8, 2*uint:16')  
        s.bitpos = occurances[i]
        data = s.read('hex:32')      
        print('Address: ' + str(occuranceOffset) + ' Data: ' + str(data))
        csvWriter.writerow([register_name, str(occuranceOffset), str(data)])
like image 337
Joshua Faust Avatar asked Mar 08 '18 20:03

Joshua Faust


1 Answers

From the documentation

bytes.find(sub[, start[, end]])

Return the lowest index in the data where the subsequence sub is found... Return -1 if sub is not found.

When the find can't locate the substring it will return -1 but in the if statement the -1 is converted into true and your code is executed. Rewrite the condition to the if search != -1: and it should start working.

like image 174
Qeek Avatar answered Nov 17 '22 10:11

Qeek