Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance in Python 3 dictionary iteration: dict[key] vs. dict.items()

Which of these is faster, and why? Or are they the same? Does the answer vary by any conditions (size of dictionary, type of data, etc.)?

Traditional:

for key in dict:
    x = dict[key]
    x = key

Hipster:

for key, value in dict.items():
    y = value
    y = key

I haven't seen an exact duplicate, but if there is one I'd be happy to be pointed to it.

like image 645
NotAnAmbiTurner Avatar asked Nov 18 '18 23:11

NotAnAmbiTurner


People also ask

Is iterating through a dictionary faster than a list?

Using iteritems is a tad bit faster... But the time to create a view is negligable; it is actually slower to iterate over than a list. This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.

Should I use dict () or {}?

With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.

Are dictionaries slow in Python?

Dictionaries in Python will find keys in O(1) on average. But complexity is not the only factor in execution time. For example accessing a list item by its index is also O(1) but it is considerably faster than accessing a dictionary by its key.

Does dict items () return a list?

The methods dict. keys() and dict. values() return lists of the keys or values explicitly. There's also an items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary.

How do I iterate through a dictionary in Python?

Take the following dictionary as an example. You can iterate the keys by using dict directly in the for statement. As mentioned above, you can iterate the keys by using dict directly but you can also use keys (). The result is the same, but keys () may make the intent clearer to the reader of the code.

What is the difference between Python 2 and Python 3 iterations?

But iterating over the list itself is basically the same as iterating over any other list: Python 3 can create and iterate over items faster than Python 2 can (compare to 57.3 us above):

Are Python dictionaries iterated in the same order?

In Python 3.6 and beyond, the keys and values of a dictionary are iterated over in the same order in which they were created. However, this behavior may vary across different Python versions, and it depends on the dictionary’s history of insertions and deletions.

What is a dict in Python?

In Python 3.7, the items-ordered feature of dict objects was declared an official part of the Python language specification. So, from that point on, developers could rely on dict when they needed a dictionary that keeps its items ordered.


2 Answers

It turns out there are actually orders of magnitude of difference.

I don't know much about performance testing, but what I tried to do was create 3 dicts of varying sizes, with each smaller dict being a subset of the larger dict. I then ran all three dicts through the two functions (Traditional vs. Hipster). Then I did that 100 times.

The dictionary sizes (number of key-value pairs) for dict1, dict2, and dict3 are 1000, 50000, 500000 respectively.

There seems to be a significant difference, with d.items() being generally faster and d.items() being WAY faster on smaller dictionaries. This is in line with expectations (Python generally rewarding "pythonic" code).

Results:

--d[key]--
dict1 -- mean: 0.0001113555802294286, st. dev: 1.9951038526222054e-05
dict2 -- mean: 0.01669296698019025, st. dev: 0.019088713496142
dict3 -- mean: 0.2553815016898443, st. dev: 0.02778986771642094

--d.items()--
dict1 -- mean: 6.005059978633653e-05, st. dev: 1.1960199272812617e-05
dict2 -- mean: 0.00507106617995305, st. dev: 0.009871762371401046
dict3 -- mean: 0.07369932165958744, st. dev: 0.023440325168927384

Code (repl.it) providing results:

import timeit
import random
import statistics

def traditional(dicty):

  for key in dicty:
    x = dicty[key]
    x = key

def hipster(dicty):

  for key, value in dicty.items():
    y = value
    y = key

def generate_random_dicts():
  random_dict1, random_dict2, random_dict3 = {}, {}, {}

  for _ in range(1000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict1[key] = val
    random_dict2[key] = val
    random_dict3[key] = val

  for _ in range(49000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict2[key] = val
    random_dict3[key] = val

  for _ in range(450000):
    key = generate_random_str_one_to_ten_chars()
    val = generate_random_str_one_to_ten_chars()
    random_dict3[key] = val

  return [random_dict1, random_dict2, random_dict3]

def generate_random_str_one_to_ten_chars():
  ret_str = ""
  for x in range(random.randrange(1,10,1)):
    ret_str += chr(random.randrange(40,126,1))
  return ret_str

dict1, dict2, dict3 = generate_random_dicts()

test_dicts = [dict1, dict2, dict3]

times = {}
times['traditional_times'] = {}
times['hipster_times'] = {}

for _ in range(100):

  for itr, dictx in enumerate(test_dicts):
    start = timeit.default_timer() 
    traditional(dictx)
    end = timeit.default_timer() 
    time = end - start
    try:
      times['traditional_times'][f"dict{itr+1}"].append(time)
    except KeyError:
      times['traditional_times'][f"dict{itr+1}"] = [time]

    start = timeit.default_timer() 
    hipster(dictx)
    end = timeit.default_timer() 
    time = end - start
    try:
      times['hipster_times'][f"dict{itr+1}"].append(time)
    except KeyError:
      times['hipster_times'][f"dict{itr+1}"] = [time]

print("--d[key]--")
for x in times['traditional_times'].keys():
  ltimes = times['traditional_times'][x]
  mean = statistics.mean(ltimes)
  stdev = statistics.stdev(ltimes)
  print(f"{x} -- mean: {mean}, st. dev: {stdev}\n\n")

print("--d.items()--")
for x in times['hipster_times'].keys():
  ltimes = times['hipster_times'][x]
  mean = statistics.mean(ltimes)
  stdev = statistics.stdev(ltimes)
  print(f"{x} -- mean: {mean}, st. dev: {stdev}")
like image 192
NotAnAmbiTurner Avatar answered Oct 25 '22 20:10

NotAnAmbiTurner


This code only needs to go through the dictionary once to retrieve everything from it:

for key, value in dict.items():

This code goes through the whole dictionary once, but retrieves only keys:

for key in dict:
    x = dict[key]

Then, for each key, it has to go into the dictionary again to look up the value. So, it has to be slower.

Still, the whole thing is purely academic and of no real significance in real life. When your application starts being too slow, it is really very very unlikely that the slowness will be caused by the way you iterate through a dictionary.

like image 22
zvone Avatar answered Oct 25 '22 21:10

zvone