The following code does not seem to read/write binary form correctly. It should read a binary file, bit-wise XOR the data and write it back to file. There are not any syntax errors but the data does not verify and I have tested the source data via another tool to confirm the xor key. Update: per feedback in the comments, this is most likely due to the endianness of the system I was testing on. xortools.py: <pre class="prettyprint"><code>def four_byte_xor(buf, key): out = '' for i in range(0,len(buf)/4): c = struct.unpack("=I", buf[(i*4):(i*4)+4])[0] c ^= key out += struct.pack("=I", c) return out </code></pre> Call to xortools.py: <pre class="prettyprint"><code>from xortools import four_byte_xor in_buf = open('infile.bin','rb').read() out_buf = open('outfile.bin','wb') out_buf.write(four_byte_xor(in_buf, 0x01010101)) out_buf.close() </code></pre> It appears that I need to read bytes per answer. How would the function above incorporate into the following as the function above manipulate multiple bytes? Or Does it not matter? Do I need to use struct? <pre class="prettyprint"><code>with open("myfile", "rb") as f: byte = f.read(1) while byte: # Do stuff with byte. byte = f.read(1) </code></pre> For an example the following file has 4 repeating bytes, 01020304: <img src="https://i.stack.imgur.com/ra4Xq.png" alt="before XOR"> The data is XOR'd with a key of 01020304 which zeros the original bytes: <img src="https://i.stack.imgur.com/CBtPB.png" alt="after XOR"> Here is an attempt with the original function, in this case 05010501 is the result which is incorrect: <img src="https://i.stack.imgur.com/sUvz2.png" alt="incorrect XOR attempt">

Try this function: <pre class="prettyprint"><code>def four_byte_xor(buf, key): outl = [] for i in range(0, len(buf), 4): chunk = buf[i:i+4] v = struct.unpack(b"=I", chunk)[0] v ^= key outl.append(struct.pack(b"=I", v)) return b"".join(outl) </code></pre> I'm not sure you're actually taking the input by 4 bytes, but I didn't try to decipher it. This assumes your input is divisible by 4. Edit, new function based in new input: <pre class="prettyprint"><code>def four_byte_xor(buf, key): key = struct.pack(b">I", key) buf = bytearray(buf) for offset in range(0, len(buf), 4): for i, byte in enumerate(key): buf[offset + i] = chr(buf[offset + i] ^ ord(byte)) return str(buf) </code></pre> This could probably be improved, but it does provide the proper output.

Read and write binary file in Python

Tags:

python

The following code does not seem to read/write binary form correctly. It should read a binary file, bit-wise XOR the data and write it back to file. There are not any syntax errors but the data does not verify and I have tested the source data via another tool to confirm the xor key.

Update: per feedback in the comments, this is most likely due to the endianness of the system I was testing on.

xortools.py:

def four_byte_xor(buf, key):
    out = ''
    for i in range(0,len(buf)/4):
        c = struct.unpack("=I", buf[(i*4):(i*4)+4])[0]
        c ^= key
        out += struct.pack("=I", c)
    return out

Call to xortools.py:

from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
out_buf = open('outfile.bin','wb')
out_buf.write(four_byte_xor(in_buf, 0x01010101))
out_buf.close()

It appears that I need to read bytes per answer. How would the function above incorporate into the following as the function above manipulate multiple bytes? Or Does it not matter? Do I need to use struct?

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte:
        # Do stuff with byte.
        byte = f.read(1)

For an example the following file has 4 repeating bytes, 01020304:

before XOR

The data is XOR'd with a key of 01020304 which zeros the original bytes:

after XOR

Here is an attempt with the original function, in this case 05010501 is the result which is incorrect:

incorrect XOR attempt

814

asked Jul 13 '12 00:07

Astron

2 Answers

Here's a relatively easy solution (tested):

import sys
from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
orig_len = len(in_buf)
new_len = ((orig_len+3)//4)*4
if new_len > orig_len:
    in_buf += ''.join(['x\00']*(new_len-orig_len))
key = 0x01020304
if sys.byteorder == "little":  # adjust for endianess of processor
    key = struct.unpack(">I", struct.pack("<I", key))[0]
out_buf = four_byte_xor(in_buf, key)
f = open('outfile.bin','wb')
f.write(out_buf[:orig_len]) # only write bytes that were part of orig
f.close()

What it does is pad the length of the data up to a whole multiple of 4 bytes, xor's that with the four-byte key, but then only writes out data that was the length of the original.

This problem was a little tricky because the byte-order of the data for a 4-byte key depends on your processor but is always written with the high-byte first, but the byte order of string or bytearrays is always written low-byte first as shown in your hex dumps. To allow the key to be specified as a hex integer, it was necessary to add code to conditionally compensate for the differing representations -- i.e. to allow the key's bytes can be specified in the same order as the bytes appearing in the hex dumps.

144

answered Oct 15 '22 23:10

martineau

Try this function:

def four_byte_xor(buf, key):
    outl = []
    for i in range(0, len(buf), 4):
        chunk = buf[i:i+4]
        v = struct.unpack(b"=I", chunk)[0]
        v ^= key
        outl.append(struct.pack(b"=I", v))
    return b"".join(outl)

I'm not sure you're actually taking the input by 4 bytes, but I didn't try to decipher it. This assumes your input is divisible by 4.

Edit, new function based in new input:

def four_byte_xor(buf, key):
    key = struct.pack(b">I", key)
    buf = bytearray(buf)
    for offset in range(0, len(buf), 4):
        for i, byte in enumerate(key):
            buf[offset + i] = chr(buf[offset + i] ^ ord(byte))
    return str(buf)

This could probably be improved, but it does provide the proper output.

answered Oct 15 '22 22:10

Keith

Related questions
                            
                                Regex: Using lookahead assertion to check if character exist at most a certain number of times
                            
                                Behavior of object in set operations
                            
                                Python/Django download Image from URL, modify, and save to ImageField
                            
                                Django Save Incomplete Progress on Form
                            
                                Python: midi to audio stream
                            
                                Tracking object allocation in python
                            
                                Setting up SCons to Autolint
                            
                                What is the most performant way to store a list of Tuples in App-Engine?
                            
                                Python - How do I write a more efficient, Pythonic reduce?
                            
                                Boolean function optimizer package for Python
                            
                                Typical Naming Conventions for Python Directories in Packages
                            
                                How to "stop" and "resume" long time running Python script?
                            
                                Is it safe to call an overridden method from __init__()?
                            
                                What are prevalent techniques for enabling user code extensions in Python?
                            
                                How to reinitialise an embedded Python interpreter?
                            
                                Streaming audio and video with Python
                            
                                Is python-markdown safe on untrusted input?
                            
                                Why does celery return a KeyError when executing my task?
                            
                                List modification in a loop
                            
                                Grab user input asynchronously and pass to an Event loop in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With