I want to read a file with data, coded in hex format:
01ff0aa121221aff110120...etc
the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)
I tried the following code (and other similar):
filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
a=f.read(1)
b=a.encode("hex")
c.append(b)
f.close()
This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!
This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.
There might be a simple solution?
Simple solution is binascii
:
import binascii
# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
# Slurp the whole file and efficiently convert it to hex all at once
hexdata = binascii.hexlify(f.read())
This just gets you a str
of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:
hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))
which will produce the list of len 2 str
s corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata
, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip
:
hexlist = map(''.join, zip(*[iter(hexdata)]*2))
Update:
For people on Python 3.5 and higher, bytes
objects spawned a .hex()
method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:
with open('data.geno', 'rb') as f:
hexdata = f.read().hex()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With