Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to generate a dict from list where key == value

I have a list, say:

NUM = 100
my_list = list(range(NUM))

I would like to generate a dict where the key is equal to the value, something like:

my_dict = {item: item for item in my_list}

or:

my_dict = dict(zip(my_list, my_list))

I have run some micro-benchmarks, and it looks like they have similar speed, but I was hoping that the second would be much faster, since the looping should be happening in C.

For example, the following construct:

my_dict = {key: SOMETHING for key in keys}

translates into the much faster:

my_dict = dict.fromkeys(k, SOMETHING)

So, my question is: is there any similar such construct for {x: x for x in my_list}?


EDIT

I have checked dir(dict) and there seems to be nothing in this direction (I would expect it to be called something like dict.fromitems()).


EDIT 2

A method like dict.fromitems() would have a broader application than this specific use-case, because:

dict.fromitems(keys, values)

could, in principle substitute both:

{k, v for k, v in zip(keys, values)}

and:

dict(zip(keys, values))
like image 798
norok2 Avatar asked Oct 04 '18 15:10

norok2


People also ask

Are Dicts faster than lists?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

How do I extract all the values of specific keys from a list of dictionaries?

Method 2: Extract specific keys from the dictionary using dict() The dict() function can be used to perform this task by converting the logic performed using list comprehension into a dictionary.

How do you create a dictionary with one key and multiple values?

In python, if we want a dictionary in which one key has multiple values, then we need to associate an object with each key as value. This value object should be capable of having various values inside it. We can either use a tuple or a list as a value in the dictionary to associate multiple values with a key.


1 Answers

No, there is no faster method available for dictionaries.

That's because the performance cost is all in processing each item from the iterator, computing its hash and slotting the key into the dictionary data hash table structures (including growing those structures dynamically). Executing the dictionary comprehension bytecode is really insignificant in comparison.

dict(zip(it, it)), {k: k for k in it} and dict.fromkeys(it) are all close in speed:

>>> from timeit import Timer
>>> tests = {
...     'dictcomp': '{k: k for k in it}',
...     'dictzip': 'dict(zip(it, it))',
...     'fromkeys': 'dict.fromkeys(it)',
... }
>>> timings = {n: [] for n in tests}
>>> for magnitude in range(2, 8):
...     it = range(10 ** magnitude)
...     for name, test in tests.items():
...         peritemtimes = []
...         for repetition in range(3):
...             count, total = Timer(test, 'from __main__ import it').autorange()
...             peritemtimes.append(total / count / (10 ** magnitude))
...         timings[name].append(min(peritemtimes))  # best of 3
...
>>> for name, times in timings.items():
...     print(f'{name:>8}', *(f'{t * 10 ** 9:5.1f} ns' for t in times), sep=' | ')
...
dictcomp |  46.5 ns |  47.5 ns |  50.0 ns |  79.0 ns | 101.1 ns | 111.7 ns
 dictzip |  49.3 ns |  56.3 ns |  71.6 ns | 109.7 ns | 132.9 ns | 145.8 ns
fromkeys |  33.9 ns |  37.2 ns |  37.4 ns |  62.7 ns |  87.6 ns |  95.7 ns

That's a table of the per-item cost for each technique, from 100 to 10 million items. The timings go up as the additional cost of growing the hash table structures accumulate.

Sure, dict.fromkeys() can process items a little bit faster, but it's not an order of magnitude faster than the other processes. It's (small) speed advantage does not come from being able to iterate in C here; the difference lies purely in not having to update the value pointer each iteration; all keys point to the single value reference.

zip() is slower because it builds additional objects (creating a 2-item tuple for each key-value pair is not a cost-free operation), and it increased the number of iterators involved in the process, you go from a single iterator for the dictionary comprehension and dict.fromkeys(), to 3 iterators (the dict() iteration delegated, via zip(), to two separate iterators for the keys and values).

There is no point in adding a separate method to the dict class to handle this in C, because

  1. is not a common enough use case anyway (creating a mapping with keys and values equal is not a common need)
  2. not going to be significantly faster in C than it would be with a dictionary comprehension anyway.
like image 200
Martijn Pieters Avatar answered Oct 03 '22 04:10

Martijn Pieters