Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine two large dictionary by key - Fastest approach

I have two large dictionaries. This is an example to demonstrate but you can imagine each dictionary having close to 100k records:

d1 = {
    '0001': [('skiing',0.789),('snow',0.65),('winter',0.56)],
    '0002': [('drama', 0.89),('comedy', 0.678),('action',-0.42),
             ('winter',-0.12),('kids',0.12)]
}

d2 = {
    '0001': [('action', 0.89),('funny', 0.58),('sports',0.12)],
    '0002': [('dark', 0.89),('Mystery', 0.678),('crime',0.12), ('adult',-0.423)]
}

I want to have a dictionary that has combined values by key from each dictionary:

{
    '0001': [
        ('skiing', 0.789), ('snow', 0.65), ('winter', 0.56),
        [('action', 0.89), ('funny', 0.58), ('sports', 0.12)]
    ],
    '0002': [
        ('drama', 0.89), ('comedy', 0.678), ('action', -0.42),
        ('winter', -0.12), ('kids', 0.12),
        [('dark', 0.89), ('Mystery', 0.678), ('crime', 0.12), ('adult', -0.423)]
    ]
}

The way I would achieve this is:

for key, value in d1.iteritems():
    if key in d2:
        d1[key].append(d2[key])

But after reading in many places I found out that iteritems() is really slow and doesn't actually use C data structures to do it, but uses Python functions. How can I do this combine/merge process fast and efficiently?

like image 712
add-semi-colons Avatar asked Mar 17 '23 17:03

add-semi-colons


2 Answers

for k, v in d2.items():
    if k in d1:
        d1[k].extend(v)
    else:
        d1[k] = v  
like image 51
ferhat Avatar answered Apr 02 '23 19:04

ferhat


I think you need to merge the dicts

from collections import Counter
res = Counter(d1) + Counter(d2)
>>>res
Counter({'0001': [('skiing', 0.789), ('snow', 0.65), ('winter', 0.56 **...**

For example

from collections import Counter

d1 = {"a":[1,2], "b":[]}
d2 = {"a":[1,3], "b":[5,6]}

res = Counter(d1)+Counter(d2)

>>>res
Counter({'b': [5, 6], 'a': [1, 2, 1, 3]})

Even this approach support unequal number of keys in dicts, like

d1 = {"a":[1,2], "b":[]}
d2 = {"a":[1,3], "b":[5,6], "c":["ff"]}

>>>res
Counter({'c': ['ff'], 'b': [5, 6], 'a': [1, 2, 1, 3]})
like image 21
itzMEonTV Avatar answered Apr 02 '23 17:04

itzMEonTV