Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Raw Binary Representation of a file in Python

I'd like to get the exact sequence of bits from a file into a string using Python 3. There are several questions on this topic which come close, but don't quite answer it. So far, I have this:

>>> data = open('file.bin', 'rb').read()
>>> data
'\xa1\xa7\xda4\x86G\xa0!e\xab7M\xce\xd4\xf9\x0e\x99\xce\xe94Y3\x1d\xb7\xa3d\xf9\x92\xd9\xa8\xca\x05\x0f$\xb3\xcd*\xbfT\xbb\x8d\x801\xfanX\x1e\xb4^\xa7l\xe3=\xaf\x89\x86\xaf\x0e8\xeeL\xcd|*5\xf16\xe4\xf6a\xf5\xc4\xf5\xb0\xfc;\xf3\xb5\xb3/\x9a5\xee+\xc5^\xf5\xfe\xaf]\xf7.X\x81\xf3\x14\xe9\x9fK\xf6d\xefK\x8e\xff\x00\x9a>\xe7\xea\xc8\x1b\xc1\x8c\xff\x00D>\xb8\xff\x00\x9c9...'

>>> bin(data[:][0])
'0b11111111'

OK, I can get a base-2 number, but I don't understand why data[:][x], and I still have the leading 0b. It would also seem that I have to loop through the whole string and do some casting and parsing to get the correct output. Is there a simpler way to just get the sequence of 01's without looping, parsing, and concatenating strings?

Thanks in advance!

like image 544
maximus Avatar asked Jan 23 '11 17:01

maximus


4 Answers

I would first precompute the string representation for all values 0..255

bytetable = [("00000000"+bin(x)[2:])[-8:] for x in range(256)]

or, if you prefer bits in LSB to MSB order

bytetable = [("00000000"+bin(x)[2:])[-1:-9:-1] for x in range(256)]

then the whole file in binary can be obtained with

binrep = "".join(bytetable[x] for x in open("file", "rb").read())
like image 98
6502 Avatar answered Oct 27 '22 01:10

6502


If you are OK using an external module, this uses bitstring:

>>> import bitstring
>>> bitstring.BitArray(filename='file.bin').bin
'110000101010000111000010101001111100...'

and that's it. It just makes the binary string representation of the whole file.

like image 30
Scott Griffiths Avatar answered Oct 27 '22 00:10

Scott Griffiths


It is not quite clear what the sequence of bits is meant to be. I think it would be most natural to start at byte 0 with bit 0, but it actually depends on what you want.

So here is some code to access the sequence of bits starting with bit 0 in byte 0:

def bits_from_char(c):
    i = ord(c)
    for dummy in range(8):
        yield i & 1
        i >>= 1

def bits_from_data(data):
    for c in data:
        for bit in bits_from_char(c):
            yield bit

for bit in bits_from_data(data):
    #  process bit

(Another note: you would not need data[:][0] in your code. Simply data[0] would do the trick, but without copying the whole string first.)

like image 42
Sven Marnach Avatar answered Oct 27 '22 01:10

Sven Marnach


To convert raw binary data such as b'\xa1\xa7\xda4\x86' into a bitstring that represents the data as a number in binary system (base-2) in Python 3:

>>> data = open('file.bin', 'rb').read()
>>> bin(int.from_bytes(data, 'big'))[2:]
'1010000110100111110110100011010010000110...'

See Convert binary to ASCII and vice versa.

like image 33
jfs Avatar answered Oct 27 '22 01:10

jfs