Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Argsorting values in a list of lists

Tags:

I have a list of lists A of length m. Every list of A contains positive numbers from {1, 2, ..., n}. The following is an example, where m = 3 and n = 4.

A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]

I represent every number x in A as a pair (i, j) where A[i][j] = x. I would like to sort the numbers in A in non-decreasing order; breaking ties by lowest first index. That is, if A[i1][j1] == A[i2][j2], then (i1, j1) comes before (i2, j2) iff i1 <= i2.

In the example, I would like to return the pairs:

(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)

which represents the sorted numbers

1, 1, 1, 1, 1, 2, 2, 3, 4

What I did is a naive approach that works as follows:

  • First I sort every list in A.
  • Then I iterate the numbers in {1, 2, ..., n} and the list A and add the pairs.

Code:

for i in range(m): 
    A[i].sort()
S = []
for x in range(1, n+1):
    for i in range(m):
        for j in range(len(A[i])):
            if A[i][j] == x:
                S.append((i, j))

I think this approach is not good. Can we do better?

like image 420
zdm Avatar asked May 27 '18 19:05

zdm


People also ask

How do you sort a list according to another list?

Approach : Zip the two lists. Create a new, sorted list based on the zip using sorted(). Using a list comprehension extract the first elements of each pair from the sorted, zipped list.

What does Argsort () do in Python?

Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.


2 Answers

list.sort

You can generate a list of indexes and then call list.sort with a key:

B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
B.sort(key=lambda ix: A[ix[0]][ix[1]])

print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]

Note that on python-2.x where iterable unpacking in functions is supported, you can simplify the sort call a bit:

B.sort(key=lambda (i, j): A[i][j])

sorted

This is an alternative to the version above, and generates two lists (one in memory which sorted then works on, to return another copy).

B = sorted([
       (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
    ], 
    key=lambda ix: A[ix[0]][ix[1]]
)

print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]

Performance

On popular demand, adding some timings and a plot.

enter image description here

From the graph, we see that calling list.sort is more efficient than sorted. This is because list.sort performs an in-place sort, so there is no time/space overhead from creating a copy of the data which sorted has.

Functions

def cs1(A):
    B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
    B.sort(key=lambda ix: A[ix[0]][ix[1]]) 

    return B

def cs2(A):
    return sorted([
           (i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
        ], 
        key=lambda ix: A[ix[0]][ix[1]]
    )

def rorydaulton(A):

    triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
    pairs = [(i, j) for x, i, j in sorted(triplets)]

    return pairs

def jpp(A):
    def _create_array(data):
        lens = np.array([len(i) for i in data])
        mask = np.arange(lens.max()) < lens[:,None]
        out = np.full(mask.shape, max(map(max, data))+1, dtype=int)  # Pad with max_value + 1
        out[mask] = np.concatenate(data)
        return out

    def _apply_argsort(arr):
        return np.dstack(np.unravel_index(np.argsort(arr.ravel()), arr.shape))[0]

    return _apply_argsort(_create_array(A))[:sum(map(len, A))]

def agngazer(A):
    idx = np.argsort(np.fromiter(chain(*A), dtype=np.int))
    return np.array(
       tuple((i, j) for i, r in enumerate(A) for j, _ in enumerate(r))
    )[idx]

Performance Benchmarking Code

from timeit import timeit
from itertools import chain

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

res = pd.DataFrame(
       index=['cs1', 'cs2', 'rorydaulton', 'jpp', 'agngazer'],
       columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000],
       dtype=float
)

for f in res.index: 
    for c in res.columns:
        l = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] * c
        stmt = '{}(l)'.format(f)
        setp = 'from __main__ import l, {}'.format(f)
        res.at[f, c] = timeit(stmt, setp, number=30)

ax = res.div(res.min()).T.plot(loglog=True) 
ax.set_xlabel("N"); 
ax.set_ylabel("time (relative)");

plt.show();
like image 38
cs95 Avatar answered Oct 20 '22 15:10

cs95


You could make triplets of (x, i, j), sort those triplets, then extract the indices (i, j). That works because the triplets contain all the information needed for the sorting and for inclusion in the final list, in the order needed for the sorting. (This is called the "Decorate-Sort-Undecorate" idiom, related to the Schwartzian transform--Hat-tip to @Morgen for the name and generalization and the motivation for me to explain the generality of this technique.) This could be combined into a single statement but I split it up here for clarity.

A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]

triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
pairs = [(i, j) for x, i, j in sorted(triplets)]
print(pairs)

Here is the printed result:

[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]
like image 163
Rory Daulton Avatar answered Oct 20 '22 13:10

Rory Daulton