Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare a dict to itself and remove similar keys efficiently

Tags:

python

I would like to compare elements in a dictionary to one another, and remove items according to some comparison criteria. And I'd love it to be efficient. I have a function that can do this, but it repeatedly copies the dictionay. Surely there is a superior way:

mydict = {1:5,2:7,3:9,4:9,7:7,8:0,111:43,110:77}

def partial_duplicate_destroyer(mydict,tolerance):
    for key1 in mydict.keys():
        mydict_copy = mydict.copy()
        for key2 in mydict_copy.keys():
            if key2 - tolerance < key1 < key2 + tolerance and not(key1 == key2):
                del(mydict[key1])
                break
    return mydict

print partial_duplicate_destroyer(mydict,2)
print partial_duplicate_destroyer(mydict,20)
print partial_duplicate_destroyer(mydict,200)

#correct output:
# {4: 9, 8: 0, 111: 43}
# {8: 0, 111: 43}
# {111: 43}
like image 308
fraxel Avatar asked Apr 18 '12 18:04

fraxel


1 Answers

This approach could be reduced down to:

from itertools import combinations

def partial_duplicate_destroyer(mydict, tolerance):
    #Modifies in-place. Returns only as a convenience. Copy if you don't want this behaviour.
    for key1, key2 in combinations(mydict, 2):
       if key1 in mydict and key2 - tolerance < key1 < key2 + tolerance:
         del mydict[key1]
    return mydict

If we try this:

>>> mydict = {1:5,2:7,3:9,4:9,7:7,8:0,111:43,110:77}
>>> partial_duplicate_destroyer(mydict, 2)
{4: 9, 8: 0, 111: 43}
>>> partial_duplicate_destroyer(mydict, 20)
{8: 0, 111: 43}
>>> partial_duplicate_destroyer(mydict, 200)
{111: 43}
>>> mydict
{111: 43}

This uses itertools.combinations() to produce all possible combinations of keys (without repeating). This is much more efficient as you don't bother working where you have the keys the same, and it's done more efficiently in C rather than your side with Python.

Note that here you are modifying mydict in place - that is at the end of this, mydict is {111: 43} - you need to copy the dict and work on it in the function, rather than directly on it, if you don't want this behaviour. This is what is shown in the last line.

like image 77
Gareth Latty Avatar answered Oct 20 '22 20:10

Gareth Latty