I have a 2D-list.
l = [[100, 1], [43, 2], [201, 1], [5, 7], ...]
I want to convert the list into a dictionary where the second element serves as key. The value of each key should be a list of all first elements, that have the key as second element. The dictionary for this sample list should look like that:
{
1: [100, 201],
2: [43],
7: [5],
...
}
I have two solutions for this conversion. Which one of them is more efficient and why? Also: Is there another more efficient solution?
Solution1:
d = {}
for elem in l:
if elem[1] in d:
d[elem[1]].append(elem[0])
else:
d[elem[1]] = [elem[0]]
Solution2:
d = {}
for elem in l:
d[elem[1]] = []
for elem in l:
d[elem[1]].append(elem[0])
There's nothing wrong with you're two solutions, they're fine and efficient:
From a theoretical perspective these are identical. Run time I would wager they are very close as well. A more pythonic way (asking for forgiveness rather than permission ,thanks @tobias_k) that may improve the first solution is:
d = {}
for elem in l:
try:
d[elem[1]].append(elem[0])
except KeyError:
d[elem[1]] = [elem[0]]
If you have many repeating keys, this would be better, since the overhead of an exception would be much smaller than all the ifs (and only one lookup), so this would depend on the actual list in question. If you go for this solution you may want to read up on a defaultdict.
Some new info
My guess was wrong! even though theoretically they are the same, and it seems like the amount of operations are the same, there is a difference. I'm guessing this has to do with Python's ability to optimize if
statements.
Marking your first method a
, your second one b
, and the one I provided c
, using the timeit module I timed these methods for two lists:
l1=[(x,y) for x,y in zip(range(1000),range(1000,2000)]
l1=[(x,2) for x,y in range(1000)]
Results for l1
:
Method a
is fastest. 30% slower is b
, and another 30% from that is c
.
Results for l2
:
Methods a
and b
are nearly identical (a still a bit faster), and c
is a bit faster than both (which we expected).
I would say for practical purposes the first version is better than the second, but the 3d would be best if you have a lot of repeating keys. bottom line, theory is all well and good, but the best method practically is list dependent.
The 1st Solution is more efficient as it takes O(n) time to complete: The if statement is O(1) since dictionary lookup is O(1).
The 2nd Solution is also O(n) but it runs twice, so O(n+n).
And in the 1st Solution, you could skip the if statement and directly append the elem[0], and add a try except block.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With