Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashing same character multiple times

I'm doing a programming challenge and I'm going crazy with one of the challenges. In the challenge, I need to compute the MD5 of a string. The string is given in the following form:

n[c]: Where n is a number and c is a character. For example: b3[a2[c]] => baccaccacc

Everything went ok until I was given the following string:

1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]

This strings turns into a string with 6227020800 a's. This string is more than 6GB, so it's nearly impossible to compute it in practical time. So, here is my question:

Are there any properties of MD5 that I can use here?

I know that there has to be a form to make it in short time, and I suspect it has to be related to the fact that all the string has is the same character repeated multiple times.

like image 767
Lars Avatar asked May 06 '13 13:05

Lars


People also ask

Can 2 values have the same hash?

Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low.

Can 2 strings have the same hash?

It's definitely possible, because there are more possible Strings than possible hash code values. But that doesn't mean it's useful .

Is the hash of a string always the same?

Yes, it will be consistent since strings are immutable. However, I think you're misusing the dictionary. You should let the dictionary take the hash of the string for you by using the string as the key. Hashes are not guaranteed to be unique, so you may overwrite one key with another.

Can hashes be the same?

At the same time, two keys can also generate an identical hash. This phenomenon is called a collision. A good hash function never produces the same hash value from two different inputs.


1 Answers

You probably have created a (recursive) function to produce the result as a single value. Instead you should use a generator to produce the result as a stream of bytes. These you can then feed byte by byte into your MD5 hash routine. The size of the stream does not matter this way, it will just have an impact on the computation time, not on the memory used.

Here's an example using a single-pass parser:

import re, sys, md5

def p(s, pos, callBack):
  while pos < len(s):
    m = re.match(r'(d+)[', s[pos:])
    if m:  # repetition?
      number = m.group(1)
      for i in range(int(number)):
        endPos = p(s, pos+len(number)+1, callBack)
      pos = endPos
    elif s[pos] == ']':
      return pos + 1
    else:
      callBack(s[pos])
      pos += 1
  return pos + 1

digest = md5.new()
def feed(s):
  digest.update(s)
  sys.stdout.write(s)
  sys.stdout.flush()

end = p(sys.argv[1], 0, feed)
print
print "MD5:", digest.hexdigest()
print "finished parsing input at pos", end
like image 165
Alfe Avatar answered Sep 29 '22 20:09

Alfe