Efficiency: 2D-list to dictionary in python

Question

I have a 2D-list.

l = [[100, 1], [43, 2], [201, 1], [5, 7], ...]

I want to convert the list into a dictionary where the second element serves as key. The value of each key should be a list of all first elements, that have the key as second element. The dictionary for this sample list should look like that:

{
    1: [100, 201],
    2: [43],
    7: [5],
    ...
}

I have two solutions for this conversion. Which one of them is more efficient and why? Also: Is there another more efficient solution?

Solution1:

d = {}
for elem in l:
    if elem[1] in d:
        d[elem[1]].append(elem[0])
    else:
        d[elem[1]] = [elem[0]]

Solution2:

d = {}

for elem in l:
    d[elem[1]] = []

for elem in l:
    d[elem[1]].append(elem[0])

kabanus · Accepted Answer

There's nothing wrong with you're two solutions, they're fine and efficient:

Solution 1 iterates the list once, but performs a dictionary look up twice. Since this is about O(1), and you have 2, I would say this is 2xN.
Solution 2 iterates the list twice, with one lookup each time - again 2xN.

From a theoretical perspective these are identical. Run time I would wager they are very close as well. A more pythonic way (asking for forgiveness rather than permission ,thanks @tobias_k) that may improve the first solution is:

d = {}
for elem in l:
    try:
        d[elem[1]].append(elem[0])
    except KeyError:
        d[elem[1]] = [elem[0]]

If you have many repeating keys, this would be better, since the overhead of an exception would be much smaller than all the ifs (and only one lookup), so this would depend on the actual list in question. If you go for this solution you may want to read up on a defaultdict.

Some new info

My guess was wrong! even though theoretically they are the same, and it seems like the amount of operations are the same, there is a difference. I'm guessing this has to do with Python's ability to optimize if statements.

Marking your first method a, your second one b, and the one I provided c, using the timeit module I timed these methods for two lists:

l1=[(x,y) for x,y in zip(range(1000),range(1000,2000)]
l1=[(x,2) for x,y in range(1000)]

Results for l1:

Method a is fastest. 30% slower is b, and another 30% from that is c.

Results for l2:

Methods a and b are nearly identical (a still a bit faster), and c is a bit faster than both (which we expected).

I would say for practical purposes the first version is better than the second, but the 3d would be best if you have a lot of repeating keys. bottom line, theory is all well and good, but the best method practically is list dependent.

Sum of N upto Infinity · Answer

The 1st Solution is more efficient as it takes O(n) time to complete: The if statement is O(1) since dictionary lookup is O(1).

The 2nd Solution is also O(n) but it runs twice, so O(n+n).

And in the 1st Solution, you could skip the if statement and directly append the elem[0], and add a try except block.

Efficiency: 2D-list to dictionary in python

Tags:

performance

python

dictionary

list

python-3.x

sinaj

2 Answers

kabanus

Sum of N upto Infinity

Recent Activity

Donate For Us

Efficiency: 2D-list to dictionary in python

Tags:

performance

python

dictionary

list

python-3.x

sinaj

2 Answers

kabanus

Sum of N upto Infinity

Related questions

Recent Activity

Donate For Us