Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A proper way to add values of multiple dictionaries to another dictionary using the same key

I want to add 3 different values of 3 different dictionaries to 1 "all_in_one" dictionary based on the same key.

I have 3 big dictionaries based on the same text corpora (each of the files in it contains the values from the same line -- multiple lines, actually -- of these files, but different columns of them). All 3 dictionaries have the same key.

They look like this:

tokens = {"token1": 10, "token2": 56, "token3": 90, ...}

lemmas = {"token1": "lemma1", "token2": "lemma2", "token2": "lemma3", ...}

categs = {"token1": "categX", "token2": "categY", "token3": "categZ", ...}

I want to add these values to another dictionary to have it look like this:

all_in_one = {"token1": [tokens[value1], lemmas[value1], categs[value1]],
              "token2": [tokens[value2], lemmas[value2], categs[value2]], ... } 

I have such a loop:

all_in_one = {}

for tk, tv in tokens.items():
    for lk, lv in lemmas.items():
        for ck, cv in categs.items():
            if tk == lk == ck:
                all_in_one[tk] = [tv, lv, cv]

The problem is, it works (don't know if it's fine), but with small amount of files. I have 500k files. Haven't tried to run it with the final corpora, because even the first try with 100 files took a few hours and haven't finished (100 files = 6500 tokens, so I assume it's 6500^3 loops...). I've only tested it with 10 and 20 files.

Is it even a proper loop for doing this (adding values of 3 dics into another dic)? If yes (I doubt, based on the time needed), maybe there's a way to optimize it?

like image 675
Daniel Borysowski Avatar asked Mar 04 '23 18:03

Daniel Borysowski


2 Answers

My answer assumes that all three dictionaries have equal and exact same keys. In that case, I don't think you need 3 for loops here. You just need a single for loop. Since the keys are the same, and you only need to club together the values of same keys, you can simply loop over any one of the dictionary's keys and do

all_in_one = {}

for tk, tv in tokens.items():
    all_in_one[tk] = [tv, lemmas[tk], categs[tk]]
like image 97
Sheldore Avatar answered Mar 07 '23 07:03

Sheldore


Since keys are identical across all dictionaries, you can use a dictionary comprehension iterating over the keys of any one of those dictionaries. To reduce repeated logic, you can use operator.itemgetter:

from operator import itemgetter

tokens = {"token1": 10, "token2": 56, "token3": 90}
lemmas = {"token1": "lemma1", "token2": "lemma2", "token3": "lemma3"}
categs = {"token1": "categX", "token2": "catehY", "token3": "categZ"}

all_in_one = {k: list(map(itemgetter(k), (tokens, lemmas, categs))) for k in tokens}

# {'token1': [10, 'lemma1', 'categX'],
#  'token2': [56, 'lemma2', 'catehY'],
#  'token3': [90, 'lemma3', 'categZ']}

Other alternatives include defining a list explicitly or using a list comprehension:

# define list explicitly
all_in_one = {k: [tokens[k], lemmas[k], categs[k]] for k in tokens}

# use list comprehension
all_in_one = {k: [lst[k] for lst in (tokens, lemmas, categs)] for k in tokens}
like image 28
jpp Avatar answered Mar 07 '23 07:03

jpp