The following code does not seem to read/write binary form correctly. It should read a binary file, bit-wise XOR the data and write it back to file. There are not any syntax errors but the data does not verify and I have tested the source data via another tool to confirm the xor key.
Update: per feedback in the comments, this is most likely due to the endianness of the system I was testing on.
xortools.py:
def four_byte_xor(buf, key):
out = ''
for i in range(0,len(buf)/4):
c = struct.unpack("=I", buf[(i*4):(i*4)+4])[0]
c ^= key
out += struct.pack("=I", c)
return out
Call to xortools.py:
from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
out_buf = open('outfile.bin','wb')
out_buf.write(four_byte_xor(in_buf, 0x01010101))
out_buf.close()
It appears that I need to read bytes per answer. How would the function above incorporate into the following as the function above manipulate multiple bytes? Or Does it not matter? Do I need to use struct?
with open("myfile", "rb") as f:
byte = f.read(1)
while byte:
# Do stuff with byte.
byte = f.read(1)
For an example the following file has 4 repeating bytes, 01020304:
The data is XOR'd with a key of 01020304 which zeros the original bytes:
Here is an attempt with the original function, in this case 05010501 is the result which is incorrect:
First, open a file in binary write mode and then specify the contents to write in the form of bytes. Next, use the write function to write the byte contents to a binary file.
One of the most common tasks that you can do with Python is reading and writing files. Whether it's writing to a simple text file, reading a complicated server log, or even analyzing raw byte data, all of these situations require reading or writing a file.
"Binary" files are any files where the format isn't made up of readable characters. Binary files can range from image files like JPEGs or GIFs, audio files like MP3s or binary document formats like Word or PDF. In Python, files are opened in text mode by default.
Here's a relatively easy solution (tested):
import sys
from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
orig_len = len(in_buf)
new_len = ((orig_len+3)//4)*4
if new_len > orig_len:
in_buf += ''.join(['x\00']*(new_len-orig_len))
key = 0x01020304
if sys.byteorder == "little": # adjust for endianess of processor
key = struct.unpack(">I", struct.pack("<I", key))[0]
out_buf = four_byte_xor(in_buf, key)
f = open('outfile.bin','wb')
f.write(out_buf[:orig_len]) # only write bytes that were part of orig
f.close()
What it does is pad the length of the data up to a whole multiple of 4 bytes, xor's that with the four-byte key, but then only writes out data that was the length of the original.
This problem was a little tricky because the byte-order of the data for a 4-byte key depends on your processor but is always written with the high-byte first, but the byte order of string or bytearrays is always written low-byte first as shown in your hex dumps. To allow the key to be specified as a hex integer, it was necessary to add code to conditionally compensate for the differing representations -- i.e. to allow the key's bytes can be specified in the same order as the bytes appearing in the hex dumps.
Try this function:
def four_byte_xor(buf, key):
outl = []
for i in range(0, len(buf), 4):
chunk = buf[i:i+4]
v = struct.unpack(b"=I", chunk)[0]
v ^= key
outl.append(struct.pack(b"=I", v))
return b"".join(outl)
I'm not sure you're actually taking the input by 4 bytes, but I didn't try to decipher it. This assumes your input is divisible by 4.
Edit, new function based in new input:
def four_byte_xor(buf, key):
key = struct.pack(b">I", key)
buf = bytearray(buf)
for offset in range(0, len(buf), 4):
for i, byte in enumerate(key):
buf[offset + i] = chr(buf[offset + i] ^ ord(byte))
return str(buf)
This could probably be improved, but it does provide the proper output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With