Which of these is faster, and why? Or are they the same? Does the answer vary by any conditions (size of dictionary, type of data, etc.)?
Traditional:
for key in dict:
x = dict[key]
x = key
Hipster:
for key, value in dict.items():
y = value
y = key
I haven't seen an exact duplicate, but if there is one I'd be happy to be pointed to it.
Using iteritems is a tad bit faster... But the time to create a view is negligable; it is actually slower to iterate over than a list. This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.
With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.
Dictionaries in Python will find keys in O(1) on average. But complexity is not the only factor in execution time. For example accessing a list item by its index is also O(1) but it is considerably faster than accessing a dictionary by its key.
The methods dict. keys() and dict. values() return lists of the keys or values explicitly. There's also an items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary.
Take the following dictionary as an example. You can iterate the keys by using dict directly in the for statement. As mentioned above, you can iterate the keys by using dict directly but you can also use keys (). The result is the same, but keys () may make the intent clearer to the reader of the code.
But iterating over the list itself is basically the same as iterating over any other list: Python 3 can create and iterate over items faster than Python 2 can (compare to 57.3 us above):
In Python 3.6 and beyond, the keys and values of a dictionary are iterated over in the same order in which they were created. However, this behavior may vary across different Python versions, and it depends on the dictionary’s history of insertions and deletions.
In Python 3.7, the items-ordered feature of dict objects was declared an official part of the Python language specification. So, from that point on, developers could rely on dict when they needed a dictionary that keeps its items ordered.
It turns out there are actually orders of magnitude of difference.
I don't know much about performance testing, but what I tried to do was create 3 dicts of varying sizes, with each smaller dict being a subset of the larger dict. I then ran all three dicts through the two functions (Traditional vs. Hipster). Then I did that 100 times.
The dictionary sizes (number of key-value pairs) for dict1, dict2, and dict3 are 1000, 50000, 500000 respectively.
There seems to be a significant difference, with d.items()
being generally faster and d.items()
being WAY faster on smaller dictionaries. This is in line with expectations (Python generally rewarding "pythonic" code).
Results:
--d[key]--
dict1 -- mean: 0.0001113555802294286, st. dev: 1.9951038526222054e-05
dict2 -- mean: 0.01669296698019025, st. dev: 0.019088713496142
dict3 -- mean: 0.2553815016898443, st. dev: 0.02778986771642094
--d.items()--
dict1 -- mean: 6.005059978633653e-05, st. dev: 1.1960199272812617e-05
dict2 -- mean: 0.00507106617995305, st. dev: 0.009871762371401046
dict3 -- mean: 0.07369932165958744, st. dev: 0.023440325168927384
Code (repl.it) providing results:
import timeit
import random
import statistics
def traditional(dicty):
for key in dicty:
x = dicty[key]
x = key
def hipster(dicty):
for key, value in dicty.items():
y = value
y = key
def generate_random_dicts():
random_dict1, random_dict2, random_dict3 = {}, {}, {}
for _ in range(1000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict1[key] = val
random_dict2[key] = val
random_dict3[key] = val
for _ in range(49000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict2[key] = val
random_dict3[key] = val
for _ in range(450000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict3[key] = val
return [random_dict1, random_dict2, random_dict3]
def generate_random_str_one_to_ten_chars():
ret_str = ""
for x in range(random.randrange(1,10,1)):
ret_str += chr(random.randrange(40,126,1))
return ret_str
dict1, dict2, dict3 = generate_random_dicts()
test_dicts = [dict1, dict2, dict3]
times = {}
times['traditional_times'] = {}
times['hipster_times'] = {}
for _ in range(100):
for itr, dictx in enumerate(test_dicts):
start = timeit.default_timer()
traditional(dictx)
end = timeit.default_timer()
time = end - start
try:
times['traditional_times'][f"dict{itr+1}"].append(time)
except KeyError:
times['traditional_times'][f"dict{itr+1}"] = [time]
start = timeit.default_timer()
hipster(dictx)
end = timeit.default_timer()
time = end - start
try:
times['hipster_times'][f"dict{itr+1}"].append(time)
except KeyError:
times['hipster_times'][f"dict{itr+1}"] = [time]
print("--d[key]--")
for x in times['traditional_times'].keys():
ltimes = times['traditional_times'][x]
mean = statistics.mean(ltimes)
stdev = statistics.stdev(ltimes)
print(f"{x} -- mean: {mean}, st. dev: {stdev}\n\n")
print("--d.items()--")
for x in times['hipster_times'].keys():
ltimes = times['hipster_times'][x]
mean = statistics.mean(ltimes)
stdev = statistics.stdev(ltimes)
print(f"{x} -- mean: {mean}, st. dev: {stdev}")
This code only needs to go through the dictionary once to retrieve everything from it:
for key, value in dict.items():
This code goes through the whole dictionary once, but retrieves only keys:
for key in dict:
x = dict[key]
Then, for each key, it has to go into the dictionary again to look up the value. So, it has to be slower.
Still, the whole thing is purely academic and of no real significance in real life. When your application starts being too slow, it is really very very unlikely that the slowness will be caused by the way you iterate through a dictionary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With