I have a list of lists A
of length m
. Every list of A
contains positive numbers from {1, 2, ..., n}
. The following is an example, where m = 3
and n = 4
.
A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]
I represent every number x
in A
as a pair (i, j)
where A[i][j] = x
. I would like to sort the numbers in A
in non-decreasing order; breaking ties by lowest first index. That is, if A[i1][j1] == A[i2][j2]
, then (i1, j1)
comes before (i2, j2)
iff i1 <= i2
.
In the example, I would like to return the pairs:
(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)
which represents the sorted numbers
1, 1, 1, 1, 1, 2, 2, 3, 4
What I did is a naive approach that works as follows:
A
.{1, 2, ..., n}
and the list A
and add the pairs. Code:
for i in range(m):
A[i].sort()
S = []
for x in range(1, n+1):
for i in range(m):
for j in range(len(A[i])):
if A[i][j] == x:
S.append((i, j))
I think this approach is not good. Can we do better?
Approach : Zip the two lists. Create a new, sorted list based on the zip using sorted(). Using a list comprehension extract the first elements of each pair from the sorted, zipped list.
Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.
list.sort
You can generate a list of indexes and then call list.sort
with a key
:
B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
B.sort(key=lambda ix: A[ix[0]][ix[1]])
print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]
Note that on python-2.x where iterable unpacking in functions is supported, you can simplify the sort
call a bit:
B.sort(key=lambda (i, j): A[i][j])
sorted
This is an alternative to the version above, and generates two lists (one in memory which sorted
then works on, to return another copy).
B = sorted([
(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
],
key=lambda ix: A[ix[0]][ix[1]]
)
print(B)
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]
On popular demand, adding some timings and a plot.
From the graph, we see that calling list.sort
is more efficient than sorted
. This is because list.sort
performs an in-place sort, so there is no time/space overhead from creating a copy of the data which sorted
has.
Functions
def cs1(A):
B = [(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)]
B.sort(key=lambda ix: A[ix[0]][ix[1]])
return B
def cs2(A):
return sorted([
(i, j) for i, x in enumerate(A) for j, _ in enumerate(x)
],
key=lambda ix: A[ix[0]][ix[1]]
)
def rorydaulton(A):
triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
pairs = [(i, j) for x, i, j in sorted(triplets)]
return pairs
def jpp(A):
def _create_array(data):
lens = np.array([len(i) for i in data])
mask = np.arange(lens.max()) < lens[:,None]
out = np.full(mask.shape, max(map(max, data))+1, dtype=int) # Pad with max_value + 1
out[mask] = np.concatenate(data)
return out
def _apply_argsort(arr):
return np.dstack(np.unravel_index(np.argsort(arr.ravel()), arr.shape))[0]
return _apply_argsort(_create_array(A))[:sum(map(len, A))]
def agngazer(A):
idx = np.argsort(np.fromiter(chain(*A), dtype=np.int))
return np.array(
tuple((i, j) for i, r in enumerate(A) for j, _ in enumerate(r))
)[idx]
Performance Benchmarking Code
from timeit import timeit
from itertools import chain
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
res = pd.DataFrame(
index=['cs1', 'cs2', 'rorydaulton', 'jpp', 'agngazer'],
columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000],
dtype=float
)
for f in res.index:
for c in res.columns:
l = [[1, 1, 3], [1, 2], [1, 1, 2, 4]] * c
stmt = '{}(l)'.format(f)
setp = 'from __main__ import l, {}'.format(f)
res.at[f, c] = timeit(stmt, setp, number=30)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show();
You could make triplets of (x, i, j)
, sort those triplets, then extract the indices (i, j)
. That works because the triplets contain all the information needed for the sorting and for inclusion in the final list, in the order needed for the sorting. (This is called the "Decorate-Sort-Undecorate" idiom, related to the Schwartzian transform--Hat-tip to @Morgen for the name and generalization and the motivation for me to explain the generality of this technique.) This could be combined into a single statement but I split it up here for clarity.
A = [[1, 1, 3], [1, 2], [1, 1, 2, 4]]
triplets = [(x, i, j) for i, row in enumerate(A) for j, x in enumerate(row)]
pairs = [(i, j) for x, i, j in sorted(triplets)]
print(pairs)
Here is the printed result:
[(0, 0), (0, 1), (1, 0), (2, 0), (2, 1), (1, 1), (2, 2), (0, 2), (2, 3)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With