Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a list of lists of dictionaries in python

I have an object that is a list of lists of dictionaries:

myObject =[[{ "play": 5.00, "id": 1, "uid": "abc" },  \
            { "play": 1.00, "id": 2, "uid": "def" }], \
           [{ "play": 6.00, "id": 3, "uid": "ghi" },  \
            { "play": 7.00, "id": 4, "uid": "jkl" }], \
           [{ "play": 3.00, "id": 5, "uid": "mno" },  \
            { "play": 1.00, "id": 6, "uid": "pqr" }]]

I want to sort the list by the sum of play values in the dictionaries of each nested list. The object would then be sorted like this:

myObject =[[{ "play": 6.00, "id": 3, "uid": "ghi" },  \
            { "play": 7.00, "id": 4, "uid": "jkl" }], \
           [{ "play": 5.00, "id": 1, "uid": "abc" },  \
            { "play": 1.00, "id": 2, "uid": "def" }], \
           [{ "play": 3.00, "id": 5, "uid": "mno" },  \
            { "play": 1.00, "id": 6, "uid": "pqr" }]]

If it were just a list of dicts then:

sorted(myObject, key=sum(map(itemgetter(play))), reverse=True)

would work. I can't figure out how to do this without looping over the list, calculating the sum, then sorting. That is what I am doing now, but I'm trying to increase the efficiency of this code by removing loops because my list has 100's of millions of lists in it.

like image 495
jdesilvio Avatar asked Feb 28 '16 22:02

jdesilvio


1 Answers

Your idea is already very good, to use a custom key function when sorting and using sum, map and an itemgetter on the play key:

key=sum(map(itemgetter(play)))

You do have a problem there though: The key argument expects a function that takes an item of your list you are sorting. But neither sum nor map return a function, so you cannot use it as a key function. Instead, you could make a lambda function that executes this combination for each item.

The other problems are that play should be a string 'play' instead, and that map should take the sublist as an argument. So your key function would look like this:

key=lambda x: sum(map(itemgetter('play'), x))

This is btw. functionally equivalent to the following generator comprehension which might be more readable:

key=lambda x: sum(y['play'] for y in x)

Using this with sorted should work but you should consider sorting your list directly using list.sort instead:

>>> myObject = [[{ "play": 5.00, "id": 1, "uid": "abc" },
                 { "play": 1.00, "id": 2, "uid": "def" }],
                [{ "play": 6.00, "id": 3, "uid": "ghi" },
                 { "play": 7.00, "id": 4, "uid": "jkl" }],
                [{ "play": 3.00, "id": 5, "uid": "mno" },
                 { "play": 1.00, "id": 6, "uid": "pqr" }]]

>>> myObject.sort(key=lambda x: sum(y['play'] for y in x), reverse=True)

>>> for x in myObject:
        print(x)

[{'play': 6.0, 'uid': 'ghi', 'id': 3}, {'play': 7.0, 'uid': 'jkl', 'id': 4}]
[{'play': 5.0, 'uid': 'abc', 'id': 1}, {'play': 1.0, 'uid': 'def', 'id': 2}]
[{'play': 3.0, 'uid': 'mno', 'id': 5}, {'play': 1.0, 'uid': 'pqr', 'id': 6}]

(Btw. myObject is kind of a bad name for a list of things.)


As far as the efficiency or complexity of your problem goes, you really cannot avoid having to loop through every sublist eventually. It’s impossible to determine the sum of those values without looking at the values, so obviously you cannot possibly avoid this.

However, you should ensure that every sum is only ever calculated once, to avoid having to look at the items in the sublists more than once. Luckily, the default sorting using list.sort does exactly guarantee that:

The key corresponding to each item in the list is calculated once and then used for the entire sorting process.

So you will have a very efficient solution for this sorting problem.

like image 180
poke Avatar answered Oct 23 '22 03:10

poke