What is the fastest way to implementat the following logic:
def xor(data, key):
l = len(key)
buff = ""
for i in range(0, len(data)):
buff += chr(ord(data[i]) ^ ord(key[i % l]))
return buff
In my case key is 20-byte sha1 digest, and data is some binary data between 20 bytes and few (1, 2, 3) megabytes long
UPDATE:
OK guys. Here's a 3.5 times faster implementation, which splits data and key by chunks of 4, 2 or 1 bytes (in my case, most of the time it's 4-byte long integer):
def xor(data, key):
index = len(data) % 4
size = (4, 1, 2, 1)[index]
type = ('L', 'B', 'H', 'B')[index]
key_len = len(key)/size
data_len = len(data)/size
key_fmt = "<" + str(key_len) + type;
data_fmt = "<" + str(data_len) + type;
key_list = struct.unpack(key_fmt, key)
data_list = struct.unpack(data_fmt, data)
result = []
for i in range(data_len):
result.append (key_list[i % key_len] ^ data_list[i])
return struct.pack(data_fmt, *result)
Uses a lot of memory, but in my case it's not a big deal.
Any ideas how to increase the speed few more times? :-)
FINAL UPDATE:
OK, ok... numpy did the job. That's just blazing fast:
def xor(data, key):
import numpy, math
# key multiplication in order to match the data length
key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]
# Select the type size in bytes
for i in (8,4,2,1):
if not len(data) % i: break
if i == 8: dt = numpy.dtype('<Q8');
elif i == 4: dt = numpy.dtype('<L4');
elif i == 2: dt = numpy.dtype('<H2');
else: dt = numpy.dtype('B');
return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()
Initial implementation needed 8min 50sec to process a gigabyte, the second - around 2min 30sec and the last one just.... 0min 10sec.
Thanks to anyone who contributed ideas and code. You're great guys!
Not tested
Don't know if it's faster
supposing that len(mystring) is a multiple of 4
def xor(hash,mystring):
s = struct.Struct("<L")
v1 = memoryview(hash)
tab1 = []
for i in range(5):
tab1.append(s.unpack_from(v1,i*4)
v2 = memoryview(mystring)
tab2=[]
for i in range(len(mystring)/4):
tab2.append(s.unpack_from(v1,i*4))
tab3 = []
try:
for i in range(len(mystring)/20):
for j in range(5):
tab3.append(s.pack(tab1[j]^tab2[5*i+j]))
expect IndexError:
pass
return "".join(tab3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With