I want to add 3 different values
of 3 different dictionaries to 1 "all_in_one" dictionary based on the same key
.
I have 3 big dictionaries based on the same text corpora (each of the files in it contains the values
from the same line -- multiple lines, actually -- of these files, but different columns of them). All 3 dictionaries have the same key
.
They look like this:
tokens = {"token1": 10, "token2": 56, "token3": 90, ...}
lemmas = {"token1": "lemma1", "token2": "lemma2", "token2": "lemma3", ...}
categs = {"token1": "categX", "token2": "categY", "token3": "categZ", ...}
I want to add these values to another dictionary to have it look like this:
all_in_one = {"token1": [tokens[value1], lemmas[value1], categs[value1]],
"token2": [tokens[value2], lemmas[value2], categs[value2]], ... }
I have such a loop:
all_in_one = {}
for tk, tv in tokens.items():
for lk, lv in lemmas.items():
for ck, cv in categs.items():
if tk == lk == ck:
all_in_one[tk] = [tv, lv, cv]
The problem is, it works (don't know if it's fine), but with small amount of files. I have 500k files. Haven't tried to run it with the final corpora, because even the first try with 100 files took a few hours and haven't finished (100 files = 6500 tokens, so I assume it's 6500^3 loops...). I've only tested it with 10 and 20 files.
Is it even a proper loop for doing this (adding values of 3 dics into another dic)? If yes (I doubt, based on the time needed), maybe there's a way to optimize it?
My answer assumes that all three dictionaries have equal and exact same keys. In that case, I don't think you need 3 for loops here. You just need a single for loop. Since the keys are the same, and you only need to club together the values of same keys, you can simply loop over any one of the dictionary's keys and do
all_in_one = {}
for tk, tv in tokens.items():
all_in_one[tk] = [tv, lemmas[tk], categs[tk]]
Since keys are identical across all dictionaries, you can use a dictionary comprehension iterating over the keys of any one of those dictionaries. To reduce repeated logic, you can use operator.itemgetter
:
from operator import itemgetter
tokens = {"token1": 10, "token2": 56, "token3": 90}
lemmas = {"token1": "lemma1", "token2": "lemma2", "token3": "lemma3"}
categs = {"token1": "categX", "token2": "catehY", "token3": "categZ"}
all_in_one = {k: list(map(itemgetter(k), (tokens, lemmas, categs))) for k in tokens}
# {'token1': [10, 'lemma1', 'categX'],
# 'token2': [56, 'lemma2', 'catehY'],
# 'token3': [90, 'lemma3', 'categZ']}
Other alternatives include defining a list explicitly or using a list comprehension:
# define list explicitly
all_in_one = {k: [tokens[k], lemmas[k], categs[k]] for k in tokens}
# use list comprehension
all_in_one = {k: [lst[k] for lst in (tokens, lemmas, categs)] for k in tokens}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With