I have a list, say:
NUM = 100
my_list = list(range(NUM))
I would like to generate a dict
where the key is equal to the value, something like:
my_dict = {item: item for item in my_list}
or:
my_dict = dict(zip(my_list, my_list))
I have run some micro-benchmarks, and it looks like they have similar speed, but I was hoping that the second would be much faster, since the looping should be happening in C.
For example, the following construct:
my_dict = {key: SOMETHING for key in keys}
translates into the much faster:
my_dict = dict.fromkeys(k, SOMETHING)
So, my question is: is there any similar such construct for {x: x for x in my_list}
?
I have checked dir(dict)
and there seems to be nothing in this direction (I would expect it to be called something like dict.fromitems()
).
A method like dict.fromitems()
would have a broader application than this specific use-case, because:
dict.fromitems(keys, values)
could, in principle substitute both:
{k, v for k, v in zip(keys, values)}
and:
dict(zip(keys, values))
A dictionary is 6.6 times faster than a list when we lookup in 100 items.
Method 2: Extract specific keys from the dictionary using dict() The dict() function can be used to perform this task by converting the logic performed using list comprehension into a dictionary.
In python, if we want a dictionary in which one key has multiple values, then we need to associate an object with each key as value. This value object should be capable of having various values inside it. We can either use a tuple or a list as a value in the dictionary to associate multiple values with a key.
No, there is no faster method available for dictionaries.
That's because the performance cost is all in processing each item from the iterator, computing its hash and slotting the key into the dictionary data hash table structures (including growing those structures dynamically). Executing the dictionary comprehension bytecode is really insignificant in comparison.
dict(zip(it, it))
, {k: k for k in it}
and dict.fromkeys(it)
are all close in speed:
>>> from timeit import Timer
>>> tests = {
... 'dictcomp': '{k: k for k in it}',
... 'dictzip': 'dict(zip(it, it))',
... 'fromkeys': 'dict.fromkeys(it)',
... }
>>> timings = {n: [] for n in tests}
>>> for magnitude in range(2, 8):
... it = range(10 ** magnitude)
... for name, test in tests.items():
... peritemtimes = []
... for repetition in range(3):
... count, total = Timer(test, 'from __main__ import it').autorange()
... peritemtimes.append(total / count / (10 ** magnitude))
... timings[name].append(min(peritemtimes)) # best of 3
...
>>> for name, times in timings.items():
... print(f'{name:>8}', *(f'{t * 10 ** 9:5.1f} ns' for t in times), sep=' | ')
...
dictcomp | 46.5 ns | 47.5 ns | 50.0 ns | 79.0 ns | 101.1 ns | 111.7 ns
dictzip | 49.3 ns | 56.3 ns | 71.6 ns | 109.7 ns | 132.9 ns | 145.8 ns
fromkeys | 33.9 ns | 37.2 ns | 37.4 ns | 62.7 ns | 87.6 ns | 95.7 ns
That's a table of the per-item cost for each technique, from 100 to 10 million items. The timings go up as the additional cost of growing the hash table structures accumulate.
Sure, dict.fromkeys()
can process items a little bit faster, but it's not an order of magnitude faster than the other processes. It's (small) speed advantage does not come from being able to iterate in C here; the difference lies purely in not having to update the value pointer each iteration; all keys point to the single value reference.
zip()
is slower because it builds additional objects (creating a 2-item tuple for each key-value pair is not a cost-free operation), and it increased the number of iterators involved in the process, you go from a single iterator for the dictionary comprehension and dict.fromkeys()
, to 3 iterators (the dict()
iteration delegated, via zip()
, to two separate iterators for the keys and values).
There is no point in adding a separate method to the dict
class to handle this in C, because
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With