Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching/reading binary data in Python

I'm reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I'm attempting to pick out its dimensions by looking for the binary structure as detailed here.

I need to find FFC0 in the binary data, skip ahead some number of bytes, and then read 4 bytes (this should give me the image dimensions).

What's a good way of searching for the value in the binary data? Is there an equivalent of 'find', or something like re?

like image 873
Parand Avatar asked Jul 10 '10 00:07

Parand


People also ask

How does Python read binary data?

To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing. Unlike text files, binary files are not human-readable. When opened using any text editor, the data is unrecognizable.

How do you decode a binary file in Python?

You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes. open('filename', "rb") opens the binary file in read mode. b – To specify it's a binary file. No decoding of bytes to string attempt will be made.


2 Answers

You could actually load the file into a string and search that string for the byte sequence 0xffc0 using the str.find() method. It works for any byte sequence.

The code to do this depends on a couple things. If you open the file in binary mode and you're using Python 3 (both of which are probably best practice for this scenario), you'll need to search for a byte string (as opposed to a character string), which means you have to prefix the string with b.

with open(filename, 'rb') as f:     s = f.read() s.find(b'\xff\xc0') 

If you open the file in text mode in Python 3, you'd have to search for a character string:

with open(filename, 'r') as f:     s = f.read() s.find('\xff\xc0') 

though there's no particular reason to do this. It doesn't get you any advantage over the previous way, and if you're on a platform that treats binary files and text files differently (e.g. Windows), there is a chance this will cause problems.

Python 2 doesn't make the distinction between byte strings and character strings, so if you're using that version, it doesn't matter whether you include or exclude the b in b'\xff\xc0'. And if your platform treats binary files and text files identically (e.g. Mac or Linux), it doesn't matter whether you use 'r' or 'rb' as the file mode either. But I'd still recommend using something like the first code sample above just for forward compatibility - in case you ever do switch to Python 3, it's one less thing to fix.

like image 72
David Z Avatar answered Oct 02 '22 11:10

David Z


The bitstring module was designed for pretty much this purpose. For your case the following code (which I haven't tested) should help illustrate:

from bitstring import ConstBitStream # Can initialise from files, bytes, etc. s = ConstBitStream(filename='your_file') # Search to Start of Frame 0 code on byte boundary found = s.find('0xffc0', bytealigned=True) if found:     print("Found start code at byte offset %d." % found[0])     s0f0, length, bitdepth, height, width = s.readlist('hex:16, uint:16,                                                          uint:8, 2*uint:16')     print("Width %d, Height %d" % (width, height)) 
like image 29
Scott Griffiths Avatar answered Oct 02 '22 11:10

Scott Griffiths