Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read binary files as hex in Python?

I want to read a file with data, coded in hex format:

01ff0aa121221aff110120...etc

the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)

I tried the following code (and other similar):

filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
  a=f.read(1)
  b=a.encode("hex")
  c.append(b)
f.close()

This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!

This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.

There might be a simple solution?

like image 351
Per Persson Avatar asked Jan 08 '16 23:01

Per Persson


1 Answers

Simple solution is binascii:

import binascii

# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
    # Slurp the whole file and efficiently convert it to hex all at once
    hexdata = binascii.hexlify(f.read())

This just gets you a str of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:

hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))

which will produce the list of len 2 strs corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip:

hexlist = map(''.join, zip(*[iter(hexdata)]*2))

Update:

For people on Python 3.5 and higher, bytes objects spawned a .hex() method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:

with open('data.geno', 'rb') as f:
    hexdata = f.read().hex()
like image 95
ShadowRanger Avatar answered Oct 16 '22 00:10

ShadowRanger