Argsorting values in a list of lists

Q: How do you sort a list according to another list?

Approach : Zip the two lists. Create a new, sorted list based on the zip using sorted(). Using a list comprehension extract the first elements of each pair from the sorted, zipped list.

Q: What does Argsort () do in Python?

Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.

Tags:

I have a list of lists A of length m. Every list of A contains positive numbers from {1, 2, ..., n}. The following is an example, where m = 3 and n = 4.

A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]

I represent every number x in A as a pair (i, j) where A[i][j] = x. I would like to sort the numbers in A in non-decreasing order; breaking ties by lowest first index. That is, if A[i1][j1] == A[i2][j2], then (i1, j1) comes before (i2, j2) iff i1 <= i2.

In the example, I would like to return the pairs:

(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)

which represents the sorted numbers

1, 1, 1, 1, 1, 2, 2, 3, 4

What I did is a naive approach that works as follows:

First I sort every list in A.
Then I iterate the numbers in {1, 2, ..., n} and the list A and add the pairs.

Code:

for i in range(m): 
    A[i].sort()
S = []
for x in range(1, n+1):
    for i in range(m):
        for j in range(len(A[i])):
            if A[i][j] == x:
                S.append((i, j))

I think this approach is not good. Can we do better?

420

asked May 27 '18 19:05

zdm

2 Answers

`list.sort`

You can generate a list of indexes and then call list.sort with a key:

B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
B.sort(key=lambda ix: A[ix[0]][ix[1]])

print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]

Note that on python-2.x where iterable unpacking in functions is supported, you can simplify the sort call a bit:

B.sort(key=lambda (i, j): A[i][j])

`sorted`

This is an alternative to the version above, and generates two lists (one in memory which sorted then works on, to return another copy).

B = sorted([
       (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
    ], 
    key=lambda ix: A[ix[0]][ix[1]]
)

print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]

Performance

On popular demand, adding some timings and a plot.

enter image description here

From the graph, we see that calling list.sort is more efficient than sorted. This is because list.sort performs an in-place sort, so there is no time/space overhead from creating a copy of the data which sorted has.

Functions

def cs1(A):
    B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
    B.sort(key=lambda ix: A[ix[0]][ix[1]]) 

    return B

def cs2(A):
    return sorted([
           (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
        ], 
        key=lambda ix: A[ix[0]][ix[1]]
    )

def rorydaulton(A):

    triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
    pairs = [(i, j) for x, i, j in sorted(triplets)]

    return pairs

def jpp(A):
    def _create_array(data):
        lens = np.array([len(i) for i in data])
        mask = np.arange(lens.max()) < lens[:,None]
        out = np.full(mask.shape, max(map(max, data))+1, dtype=int)  # Pad with max_value + 1
        out[mask] = np.concatenate(data)
        return out

    def _apply_argsort(arr):
        return np.dstack(np.unravel_index(np.argsort(arr.ravel()), arr.shape))[0]

    return _apply_argsort(_create_array(A))[:sum(map(len, A))]

def agngazer(A):
    idx = np.argsort(np.fromiter(chain(*A), dtype=np.int))
    return np.array(
       tuple((i, j) for i, r in enumerate(A) for j, _ in enumerate(r))
    )[idx]

Performance Benchmarking Code

from timeit import timeit
from itertools import chain

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

res = pd.DataFrame(
       index=['cs1', 'cs2', 'rorydaulton', 'jpp', 'agngazer'],
       columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000],
       dtype=float
)

for f in res.index: 
    for c in res.columns:
        l = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] * c
        stmt = '{}(l)'.format(f)
        setp = 'from __main__ import l, {}'.format(f)
        res.at[f, c] = timeit(stmt, setp, number=30)

ax = res.div(res.min()).T.plot(loglog=True) 
ax.set_xlabel("N"); 
ax.set_ylabel("time (relative)");

plt.show();

answered Oct 20 '22 15:10

cs95

You could make triplets of (x, i, j), sort those triplets, then extract the indices (i, j). That works because the triplets contain all the information needed for the sorting and for inclusion in the final list, in the order needed for the sorting. (This is called the "Decorate-Sort-Undecorate" idiom, related to the Schwartzian transform--Hat-tip to @Morgen for the name and generalization and the motivation for me to explain the generality of this technique.) This could be combined into a single statement but I split it up here for clarity.

A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]

triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
pairs = [(i, j) for x, i, j in sorted(triplets)]
print(pairs)

Here is the printed result:

[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]

163

answered Oct 20 '22 13:10

Rory Daulton

Related questions
                            
                                Which takes priority: logback.xml or logback-test.xml?
                            
                                Azure ServiceBus Message Serialization/Deserialization
                            
                                How to import keras.engine.topology in Tensorflow?
                            
                                Difference between readonly keyword/Expression-bodied members in c#?, which is better?
                            
                                Visual Studio code with PyLint and autoPep8: How can I avoid PyLint to complain about my line length?
                            
                                Can I replace an existing method of an object in Python? [duplicate]
                            
                                How to resolve: ModuleNotFoundError: No module named 'google.colab'
                            
                                Why is my position:sticky not working on iOS?
                            
                                How to ignore SSL certificate validation in node requests?
                            
                                Does static make a difference for a const local variable?
                            
                                Can I use pagespeed insights for my local host website or offline?
                            
                                Problem running tests with enabled preview features in surefire and failsafe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With