Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle a dict variable with 2^50 elements?

I have to find SHA256 hashes of 2^25 random strings. And then look for collision (using birthday paradox for the last, say, 50 bits of the hash only).

I am storing the string:hash pair in a dict variable. Then sorting the variable with values (not keys) and then looking for collision using a O(n) loop.

The problem is that since there are 2^25 strings and their 2^25 hashes, so the dict variable has 2^50 values in it. This is EXTREMELY resource intensive. So, how do I do this with limited RAM, say, around 1GB?

What I have already tried:
1. Running this with a 6GB swap space. The program ran overnight and was still not done. This is essentially even slower than an O(n_square) search! The hashes are calculated with RAM usage of about 3.2 GB. But after that when it comes to the sort command, the RAM used starts shooting up again! I though the python sort uses In-Place-Quicksort :(
2. I have stored only the hashes and found a collision. But could not find the corresponding string since did not store it.

I am not supposed to use databases etc. At the most a text file but that doesn't help. Also, I am pretty new to Python but do not let that limit the level of your answer.

PS: this is an assignment. Some claim to have found the collisions in under one minute with 300MB RAM. Don't know if that is true. I have solved the problem but the answer is unreachable! At work so do not have access to code right now. Will add soon.

Code:

from Crypto.Hash import SHA256
import os
import random
import string
from operator import itemgetter

def shaa():

    trun=[]
    clist={}
    for i in range(0,33554432):

        sha=SHA256.new(str(i)).hexdigest()
        sha=int(bin(int(sha,16))[-50:],2)
        clist[i]=sha

    print 'Hashes done.'

    clist=sorted(clist.items(), key=itemgetter(1))  
    for i in range(0,33554432):

        if(clist[i]==clist[i+1]):
            #print string[i],string[i+1]
            print clist[i]
            return 1
    return 2

result=2
while(result==2):
    result=shaa()
like image 591
ritratt Avatar asked Apr 10 '12 13:04

ritratt


People also ask

Can a dictionary have multiple values for one key?

General Idea: In Python, if we want a dictionary to have multiple values for a single key, we need to store these values in their own container within the dictionary. To do so, we need to use a container as a value and add our multiple values to that container. Common containers are lists, tuples, and sets.

Can dictionary have two values?

No, each key in a dictionary should be unique. You can't have two keys with the same value. Attempting to use the same key again will just overwrite the previous value stored. If a key needs to store multiple values, then the value associated with the key should be a list or another dictionary.


1 Answers

I'd go for something like this:

Open 16 files (opened in binary mode should be fine; this will be easiest if all your strings have the same length). Generate your strings and hashes, and write them to a file depending on the first 4 bit of the hash. Then load and process each file separately. This will reduce memory usage by a factor of 16. (Of course you can use any number of files as long as you don't run out of file handles. Having to open and close each file on every access will become rather slow.)

If generating the strings and hashes is relatively inexpensive, you don't even need the files. Simply do 16 passes, and in each pass keep only thoses hashes the upper nibbles of which match the pass number.

like image 190
Sven Marnach Avatar answered Nov 11 '22 08:11

Sven Marnach