Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Fastest bitwise xor between two multibyte binary data variables

What is the fastest way to implementat the following logic:

def xor(data, key):
    l = len(key)

    buff = ""
    for i in range(0, len(data)):
        buff += chr(ord(data[i]) ^ ord(key[i % l]))
    return buff

In my case key is 20-byte sha1 digest, and data is some binary data between 20 bytes and few (1, 2, 3) megabytes long


OK guys. Here's a 3.5 times faster implementation, which splits data and key by chunks of 4, 2 or 1 bytes (in my case, most of the time it's 4-byte long integer):

def xor(data, key):
    index = len(data) % 4
    size = (4, 1, 2, 1)[index]
    type = ('L', 'B', 'H', 'B')[index]
    key_len = len(key)/size
    data_len = len(data)/size
    key_fmt = "<" + str(key_len) + type;
    data_fmt = "<" + str(data_len) + type;

    key_list = struct.unpack(key_fmt, key)
    data_list = struct.unpack(data_fmt, data)

    result = []
    for i in range(data_len):
        result.append (key_list[i % key_len] ^ data_list[i])

    return struct.pack(data_fmt, *result)

Uses a lot of memory, but in my case it's not a big deal.

Any ideas how to increase the speed few more times? :-)


OK, ok... numpy did the job. That's just blazing fast:

def xor(data, key):
    import numpy, math

    # key multiplication in order to match the data length
    key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]

    # Select the type size in bytes       
    for i in (8,4,2,1):
        if not len(data) % i: break

    if i == 8: dt = numpy.dtype('<Q8');
    elif i == 4: dt = numpy.dtype('<L4');
    elif i == 2: dt = numpy.dtype('<H2');
    else: dt = numpy.dtype('B');

    return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()

Initial implementation needed 8min 50sec to process a gigabyte, the second - around 2min 30sec and the last one just.... 0min 10sec.

Thanks to anyone who contributed ideas and code. You're great guys!

like image 455
Nikolai Gorchilov Avatar asked Apr 20 '11 18:04

Nikolai Gorchilov

1 Answers

Not tested

Don't know if it's faster

supposing that len(mystring) is a multiple of 4

def xor(hash,mystring):
    s = struct.Struct("<L")

    v1 = memoryview(hash)

    tab1 = []
    for i in range(5):

    v2 = memoryview(mystring)
    for i in range(len(mystring)/4):
    tab3 = []
        for i in range(len(mystring)/20):
            for j in range(5):
    expect IndexError:
    return "".join(tab3)
like image 82
Xavier Combelle Avatar answered Oct 08 '22 04:10

Xavier Combelle