Fastest way to generate a dict from list where key == value

Tags:

I have a list, say:

NUM = 100
my_list = list(range(NUM))

I would like to generate a dict where the key is equal to the value, something like:

my_dict = {item: item for item in my_list}

or:

my_dict = dict(zip(my_list, my_list))

I have run some micro-benchmarks, and it looks like they have similar speed, but I was hoping that the second would be much faster, since the looping should be happening in C.

For example, the following construct:

my_dict = {key: SOMETHING for key in keys}

translates into the much faster:

my_dict = dict.fromkeys(k, SOMETHING)

So, my question is: is there any similar such construct for {x: x for x in my_list}?

EDIT

I have checked dir(dict) and there seems to be nothing in this direction (I would expect it to be called something like dict.fromitems()).

EDIT 2

A method like dict.fromitems() would have a broader application than this specific use-case, because:

dict.fromitems(keys, values)

could, in principle substitute both:

{k, v for k, v in zip(keys, values)}

and:

dict(zip(keys, values))

798

asked Oct 04 '18 15:10

norok2

1 Answers

No, there is no faster method available for dictionaries.

That's because the performance cost is all in processing each item from the iterator, computing its hash and slotting the key into the dictionary data hash table structures (including growing those structures dynamically). Executing the dictionary comprehension bytecode is really insignificant in comparison.

dict(zip(it, it)), {k: k for k in it} and dict.fromkeys(it) are all close in speed:

>>> from timeit import Timer
>>> tests = {
...     'dictcomp': '{k: k for k in it}',
...     'dictzip': 'dict(zip(it, it))',
...     'fromkeys': 'dict.fromkeys(it)',
... }
>>> timings = {n: [] for n in tests}
>>> for magnitude in range(2, 8):
...     it = range(10 ** magnitude)
...     for name, test in tests.items():
...         peritemtimes = []
...         for repetition in range(3):
...             count, total = Timer(test, 'from __main__ import it').autorange()
...             peritemtimes.append(total / count / (10 ** magnitude))
...         timings[name].append(min(peritemtimes))  # best of 3
...
>>> for name, times in timings.items():
...     print(f'{name:>8}', *(f'{t * 10 ** 9:5.1f} ns' for t in times), sep=' | ')
...
dictcomp |  46.5 ns |  47.5 ns |  50.0 ns |  79.0 ns | 101.1 ns | 111.7 ns
 dictzip |  49.3 ns |  56.3 ns |  71.6 ns | 109.7 ns | 132.9 ns | 145.8 ns
fromkeys |  33.9 ns |  37.2 ns |  37.4 ns |  62.7 ns |  87.6 ns |  95.7 ns

That's a table of the per-item cost for each technique, from 100 to 10 million items. The timings go up as the additional cost of growing the hash table structures accumulate.

Sure, dict.fromkeys() can process items a little bit faster, but it's not an order of magnitude faster than the other processes. It's (small) speed advantage does not come from being able to iterate in C here; the difference lies purely in not having to update the value pointer each iteration; all keys point to the single value reference.

zip() is slower because it builds additional objects (creating a 2-item tuple for each key-value pair is not a cost-free operation), and it increased the number of iterators involved in the process, you go from a single iterator for the dictionary comprehension and dict.fromkeys(), to 3 iterators (the dict() iteration delegated, via zip(), to two separate iterators for the keys and values).

There is no point in adding a separate method to the dict class to handle this in C, because

is not a common enough use case anyway (creating a mapping with keys and values equal is not a common need)
not going to be significantly faster in C than it would be with a dictionary comprehension anyway.

200

answered Oct 03 '22 04:10

Martijn Pieters

Related questions
                            
                                Tensorflow adam optimizer in Keras
                            
                                How to use io to generate in memory data streams as file like objects?
                            
                                Access axes object in seaborn lmplot [duplicate]
                            
                                Fuzzy Match columns of Different Dataframe
                            
                                Exact inverse of pandas' "pivot" operation
                            
                                Any way to change color bar (cbar) in seaborn to a legend (for a binary heatmap)?
                            
                                Targeting a specific metric to optimize in tensorflow
                            
                                How to assign custom color to masked cells in seaborn heatmap?
                            
                                Pandas: Groupby and iterate with conditionals within groups?
                            
                                "PACKAGES DO NOT MATCH THE HASHES" error with pip
                            
                                Sigmoid function returns 1 for large positive inputs
                            
                                ModuleNotFoundError: No module named 'pip.download' when trying to install Python package for Django
                            
                                How to set requests 'user-agent' header globally
                            
                                Python xarray remove coordinates with all missing variables
                            
                                Storing the results of Web Scraping into Database
                            
                                How do I give a delay in user input to a Textbox in a dash app?
                            
                                Run custom task when call `pip install`
                            
                                How to implement Backus-Naur Form in Python
                            
                                Zip single file
                            
                                How can I flatten lists without splitting strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to generate a dict from list where key == value

Tags:

performance

python

dictionary

list