Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency: 2D-list to dictionary in python

I have a 2D-list.

l = [[100, 1], [43, 2], [201, 1], [5, 7], ...]

I want to convert the list into a dictionary where the second element serves as key. The value of each key should be a list of all first elements, that have the key as second element. The dictionary for this sample list should look like that:

{
    1: [100, 201],
    2: [43],
    7: [5],
    ...
}

I have two solutions for this conversion. Which one of them is more efficient and why? Also: Is there another more efficient solution?

Solution1:

d = {}
for elem in l:
    if elem[1] in d:
        d[elem[1]].append(elem[0])
    else:
        d[elem[1]] = [elem[0]]

Solution2:

d = {}

for elem in l:
    d[elem[1]] = []

for elem in l:
    d[elem[1]].append(elem[0])
like image 673
sinaj Avatar asked Feb 09 '18 11:02

sinaj


2 Answers

There's nothing wrong with you're two solutions, they're fine and efficient:

  1. Solution 1 iterates the list once, but performs a dictionary look up twice. Since this is about O(1), and you have 2, I would say this is 2xN.
  2. Solution 2 iterates the list twice, with one lookup each time - again 2xN.

From a theoretical perspective these are identical. Run time I would wager they are very close as well. A more pythonic way (asking for forgiveness rather than permission ,thanks @tobias_k) that may improve the first solution is:

d = {}
for elem in l:
    try:
        d[elem[1]].append(elem[0])
    except KeyError:
        d[elem[1]] = [elem[0]]

If you have many repeating keys, this would be better, since the overhead of an exception would be much smaller than all the ifs (and only one lookup), so this would depend on the actual list in question. If you go for this solution you may want to read up on a defaultdict.

Some new info

My guess was wrong! even though theoretically they are the same, and it seems like the amount of operations are the same, there is a difference. I'm guessing this has to do with Python's ability to optimize if statements.

Marking your first method a, your second one b, and the one I provided c, using the timeit module I timed these methods for two lists:

l1=[(x,y) for x,y in zip(range(1000),range(1000,2000)]
l1=[(x,2) for x,y in range(1000)]

Results for l1:

Method a is fastest. 30% slower is b, and another 30% from that is c.

Results for l2:

Methods a and b are nearly identical (a still a bit faster), and c is a bit faster than both (which we expected).

I would say for practical purposes the first version is better than the second, but the 3d would be best if you have a lot of repeating keys. bottom line, theory is all well and good, but the best method practically is list dependent.

like image 169
kabanus Avatar answered Oct 10 '22 12:10

kabanus


The 1st Solution is more efficient as it takes O(n) time to complete: The if statement is O(1) since dictionary lookup is O(1).

The 2nd Solution is also O(n) but it runs twice, so O(n+n).

And in the 1st Solution, you could skip the if statement and directly append the elem[0], and add a try except block.

like image 20
Sum of N upto Infinity Avatar answered Oct 10 '22 11:10

Sum of N upto Infinity