I'm doing a programming challenge and I'm going crazy with one of the challenges. In the challenge, I need to compute the MD5 of a string. The string is given in the following form: <code>n[c]</code>: Where <code>n</code> is a number and <code>c</code> is a character. For example: <code>b3[a2[c]]</code> => <code>baccaccacc</code> Everything went ok until I was given the following string: <code>1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]</code> This strings turns into a string with 6227020800 <code>a</code>'s. This string is more than 6GB, so it's nearly impossible to compute it in practical time. So, here is my question: Are there any properties of MD5 that I can use here? I know that there has to be a form to make it in short time, and I suspect it has to be related to the fact that all the string has is the same character repeated multiple times.

You probably have created a (recursive) function to produce the result as a single value. Instead you should use a generator to produce the result as a stream of bytes. These you can then feed byte by byte into your MD5 hash routine. The size of the stream does not matter this way, it will just have an impact on the computation time, not on the memory used. Here's an example using a single-pass parser: <pre class="prettyprint"><code>import re, sys, md5 def p(s, pos, callBack): while pos < len(s): m = re.match(r'(d+)[', s[pos:]) if m: # repetition? number = m.group(1) for i in range(int(number)): endPos = p(s, pos+len(number)+1, callBack) pos = endPos elif s[pos] == ']': return pos + 1 else: callBack(s[pos]) pos += 1 return pos + 1 digest = md5.new() def feed(s): digest.update(s) sys.stdout.write(s) sys.stdout.flush() end = p(sys.argv[1], 0, feed) print print "MD5:", digest.hexdigest() print "finished parsing input at pos", end </code></pre>

Hashing same character multiple times

I'm doing a programming challenge and I'm going crazy with one of the challenges. In the challenge, I need to compute the MD5 of a string. The string is given in the following form:

n[c]: Where n is a number and c is a character. For example: b3[a2[c]] => baccaccacc

Everything went ok until I was given the following string:

1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]

This strings turns into a string with 6227020800 a's. This string is more than 6GB, so it's nearly impossible to compute it in practical time. So, here is my question:

Are there any properties of MD5 that I can use here?

I know that there has to be a form to make it in short time, and I suspect it has to be related to the fact that all the string has is the same character repeated multiple times.

Can 2 values have the same hash?

Two files can have the same MD5 hash even if there are different. As the MD5 algorithm can take an infinity of input and give a limited number of output, it's not impossible, even if the probability of collision is very low.

Can 2 strings have the same hash?

It's definitely possible, because there are more possible Strings than possible hash code values. But that doesn't mean it's useful .

Is the hash of a string always the same?

Yes, it will be consistent since strings are immutable. However, I think you're misusing the dictionary. You should let the dictionary take the hash of the string for you by using the string as the key. Hashes are not guaranteed to be unique, so you may overwrite one key with another.

Can hashes be the same?

At the same time, two keys can also generate an identical hash. This phenomenon is called a collision. A good hash function never produces the same hash value from two different inputs.

You probably have created a (recursive) function to produce the result as a single value. Instead you should use a generator to produce the result as a stream of bytes. These you can then feed byte by byte into your MD5 hash routine. The size of the stream does not matter this way, it will just have an impact on the computation time, not on the memory used.

Here's an example using a single-pass parser:

import re, sys, md5

def p(s, pos, callBack):
  while pos < len(s):
    m = re.match(r'(d+)[', s[pos:])
    if m:  # repetition?
      number = m.group(1)
      for i in range(int(number)):
        endPos = p(s, pos+len(number)+1, callBack)
      pos = endPos
    elif s[pos] == ']':
      return pos + 1
    else:
      callBack(s[pos])
      pos += 1
  return pos + 1

digest = md5.new()
def feed(s):
  digest.update(s)
  sys.stdout.write(s)
  sys.stdout.flush()

end = p(sys.argv[1], 0, feed)
print
print "MD5:", digest.hexdigest()
print "finished parsing input at pos", end

Hashing same character multiple times

Tags:

python

string

algorithm

hash

md5

Lars

People also ask

1 Answers

Alfe

Recent Activity

Donate For Us

Hashing same character multiple times

Tags:

python

string

algorithm

hash

md5

Lars

People also ask

1 Answers

Alfe

Related questions

Recent Activity

Donate For Us