I was coding a Euler problem, and I ran into question that sparked my curiosity. I have two snippets of code. One is with lists the other uses dictionaries.
using lists:
n=100000
num=[]
suma=0
for i in range(n,1,-1):
tmp=tuple(set([n for n in factors(i)]))
if len(tmp) != 2: continue
if tmp not in num:
num.append(tmp)
suma+=i
using dictionaries:
n=100000
num={}
suma=0
for i in range(n,1,-1):
tmp=tuple(set([n for n in factors(i)]))
if len(tmp) != 2: continue
if tmp not in num:
num[tmp]=i
suma+=i
I am only concerned about performance. Why does the second example using dictionaries run incredibly fast, faster than the first example with lists. the example with dictionaries runs almost thirty-fold faster!
I tested these 2 code using n=1000000, and the first code run in 1032 seconds and the second one run in just 3.3 second,,, amazin'!
The reason is because a dictionary is a lookup, while a list is an iteration. Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.
A little benchmark shows me that iterating a list is definately faster.
It is more efficient to use dictionaries for the lookup of elements as it is faster than a list and takes less time to traverse. Moreover, lists keep the order of the elements while dictionary does not. So, it is wise to use a list data structure when you are concerned with the order of the data elements.
While I was aware that dicts are faster than arrays, I was amazed at the difference. DGP Is most of the difference due to [array unset] versus [dict unset] ? Note that [aray unset] operates on a pattern argument that must be matched against all elements of the array.
In Python, the average time complexity of a dictionary key lookup is O(1), since they are implemented as hash tables. The time complexity of lookup in a list is O(n) on average. In your code, this makes a difference in the line if tmp not in num:
, since in the list case, Python needs to search through the whole list to detect membership, whereas in the dict case it does not except for the absolute worst case.
For more details, check out TimeComplexity.
If it's about speed, you should not create any lists:
n = 100000
factors = ((frozenset(factors(i)), i) for i in range(2, n+1))
num = {k:v for k,v in factors if len(k)==2}
suma = sum(num.values())
In a list, the code if tmp not in num:
is O(n), while it is O(lgn) in dict.
Edit: The dict is based on hashing, so it is much quicker than liner list search. Thanks @user2357112 for point this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With