Which of these is faster, and why? Or are they the same? Does the answer vary by any conditions (size of dictionary, type of data, etc.)? Traditional: <pre class="prettyprint"><code>for key in dict: x = dict[key] x = key </code></pre> Hipster: <pre class="prettyprint"><code>for key, value in dict.items(): y = value y = key </code></pre> I haven't seen an exact duplicate, but if there is one I'd be happy to be pointed to it.

This code only needs to go through the dictionary once to retrieve everything from it: <pre class="prettyprint"><code>for key, value in dict.items(): </code></pre> This code goes through the whole dictionary once, but retrieves only keys: <pre class="prettyprint"><code>for key in dict: x = dict[key] </code></pre> Then, for each key, it has to go into the dictionary again to look up the value. So, it has to be slower. Still, the whole thing is purely academic and of no real significance in real life. When your application starts being too slow, it is really very very unlikely that the slowness will be caused by the way you iterate through a dictionary.

Performance in Python 3 dictionary iteration: dict[key] vs. dict.items()

Tags:

performance

python

dictionary

iteration

Which of these is faster, and why? Or are they the same? Does the answer vary by any conditions (size of dictionary, type of data, etc.)?

Traditional:

for key in dict:
    x = dict[key]
    x = key

Hipster:

for key, value in dict.items():
    y = value
    y = key

I haven't seen an exact duplicate, but if there is one I'd be happy to be pointed to it.

645

asked Nov 18 '18 23:11

NotAnAmbiTurner

2 Answers

It turns out there are actually orders of magnitude of difference.

I don't know much about performance testing, but what I tried to do was create 3 dicts of varying sizes, with each smaller dict being a subset of the larger dict. I then ran all three dicts through the two functions (Traditional vs. Hipster). Then I did that 100 times.

The dictionary sizes (number of key-value pairs) for dict1, dict2, and dict3 are 1000, 50000, 500000 respectively.

There seems to be a significant difference, with d.items() being generally faster and d.items() being WAY faster on smaller dictionaries. This is in line with expectations (Python generally rewarding "pythonic" code).

Results:

--d[key]--
dict1 -- mean: 0.0001113555802294286, st. dev: 1.9951038526222054e-05
dict2 -- mean: 0.01669296698019025, st. dev: 0.019088713496142
dict3 -- mean: 0.2553815016898443, st. dev: 0.02778986771642094

--d.items()--
dict1 -- mean: 6.005059978633653e-05, st. dev: 1.1960199272812617e-05
dict2 -- mean: 0.00507106617995305, st. dev: 0.009871762371401046
dict3 -- mean: 0.07369932165958744, st. dev: 0.023440325168927384

Code (repl.it) providing results:

import timeit
import random
import statistics

def traditional(dicty):

  for key in dicty:
    x = dicty[key]
    x = key

def hipster(dicty):

  for key, value in dicty.items():
    y = value
    y = key

def generate_random_dicts():
  random_dict1, random_dict2, random_dict3 = {}, {}, {}

  for _ in range(1000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict1[key] = val
    random_dict2[key] = val
    random_dict3[key] = val

  for _ in range(49000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict2[key] = val
    random_dict3[key] = val

  for _ in range(450000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict3[key] = val

  return [random_dict1, random_dict2, random_dict3]

def generate_random_str_one_to_ten_chars():
  ret_str = ""
  for x in range(random.randrange(1,10,1)):
    ret_str += chr(random.randrange(40,126,1))
  return ret_str

dict1, dict2, dict3 = generate_random_dicts()

test_dicts = [dict1, dict2, dict3]

times = {}
times['traditional_times'] = {}
times['hipster_times'] = {}

for _ in range(100):

  for itr, dictx in enumerate(test_dicts):
    start = timeit.default_timer() 
    traditional(dictx)
    end = timeit.default_timer() 
    time = end - start
    try:
      times['traditional_times'][f"dict{itr+1}"].append(time)
    except KeyError:
      times['traditional_times'][f"dict{itr+1}"] = [time]

    start = timeit.default_timer() 
    hipster(dictx)
    end = timeit.default_timer() 
    time = end - start
    try:
      times['hipster_times'][f"dict{itr+1}"].append(time)
    except KeyError:
      times['hipster_times'][f"dict{itr+1}"] = [time]

print("--d[key]--")
for x in times['traditional_times'].keys():
  ltimes = times['traditional_times'][x]
  mean = statistics.mean(ltimes)
  stdev = statistics.stdev(ltimes)
  print(f"{x} -- mean: {mean}, st. dev: {stdev}\n\n")

print("--d.items()--")
for x in times['hipster_times'].keys():
  ltimes = times['hipster_times'][x]
  mean = statistics.mean(ltimes)
  stdev = statistics.stdev(ltimes)
  print(f"{x} -- mean: {mean}, st. dev: {stdev}")

192

answered Oct 25 '22 20:10

NotAnAmbiTurner

This code only needs to go through the dictionary once to retrieve everything from it:

for key, value in dict.items():

This code goes through the whole dictionary once, but retrieves only keys:

for key in dict:
    x = dict[key]

Then, for each key, it has to go into the dictionary again to look up the value. So, it has to be slower.

Still, the whole thing is purely academic and of no real significance in real life. When your application starts being too slow, it is really very very unlikely that the slowness will be caused by the way you iterate through a dictionary.

answered Oct 25 '22 21:10

zvone

Related questions
                            
                                multiplying lists of lists with different lengths
                            
                                Perform operation on all "key":"value" pair in dict and store the result in a new dict object
                            
                                Get model name from instance
                            
                                TclError: no display name and no $DISPLAY environment variable in Google Colab
                            
                                What does the 'tearoff' attribute do in a tkinter Menu?
                            
                                Test if any column of a pandas DataFrame satisfies a condition
                            
                                row sum on a pandas pivot table
                            
                                Create a circular barplot in python
                            
                                Pandas: reading Excel file starting from the row below that with a specific value
                            
                                No module named graphframes Jupyter Notebook
                            
                                Check if dataframe has a zero element
                            
                                Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: 65001
                            
                                self.model() in django custom UserManager
                            
                                Fill the diagonal of Pandas DataFrame with elements from Pandas Series
                            
                                np.where() do nothing if condition fails
                            
                                Why does sigmoid & crossentropy of Keras/tensorflow have low precision?
                            
                                How to use CUDA stream in Pytorch?
                            
                                Can flow_from_directory get train and validation data from the same directory in Keras?
                            
                                (Pandas) : What is the difference ISIN() and contains ()
                            
                                How to make pip available to git bash command line on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance in Python 3 dictionary iteration: dict[key] vs. dict.items()

Tags:

performance

python

dictionary

iteration

NotAnAmbiTurner

People also ask

2 Answers

NotAnAmbiTurner

zvone

Recent Activity

Donate For Us