I'm doing a programming challenge and I'm going crazy with one of the challenges. In the challenge, I need to compute the MD5 of a string. The string is given in the following form:
n[c]
: Where n
is a number and c
is a character. For example: b3[a2[c]]
=> baccaccacc
Everything went ok until I was given the following string:
1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]
This strings turns into a string with 6227020800 a
's. This string is more than 6GB, so it's nearly impossible to compute it in practical time. So, here is my question:
Are there any properties of MD5 that I can use here?
I know that there has to be a form to make it in short time, and I suspect it has to be related to the fact that all the string has is the same character repeated multiple times.
Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low.
It's definitely possible, because there are more possible Strings than possible hash code values. But that doesn't mean it's useful .
Yes, it will be consistent since strings are immutable. However, I think you're misusing the dictionary. You should let the dictionary take the hash of the string for you by using the string as the key. Hashes are not guaranteed to be unique, so you may overwrite one key with another.
At the same time, two keys can also generate an identical hash. This phenomenon is called a collision. A good hash function never produces the same hash value from two different inputs.
You probably have created a (recursive) function to produce the result as a single value. Instead you should use a generator to produce the result as a stream of bytes. These you can then feed byte by byte into your MD5 hash routine. The size of the stream does not matter this way, it will just have an impact on the computation time, not on the memory used.
Here's an example using a single-pass parser:
import re, sys, md5
def p(s, pos, callBack):
while pos < len(s):
m = re.match(r'(d+)[', s[pos:])
if m: # repetition?
number = m.group(1)
for i in range(int(number)):
endPos = p(s, pos+len(number)+1, callBack)
pos = endPos
elif s[pos] == ']':
return pos + 1
else:
callBack(s[pos])
pos += 1
return pos + 1
digest = md5.new()
def feed(s):
digest.update(s)
sys.stdout.write(s)
sys.stdout.flush()
end = p(sys.argv[1], 0, feed)
print
print "MD5:", digest.hexdigest()
print "finished parsing input at pos", end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With