Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logically sorting a list of lists (partially ordered set -> topological sort)

Tags:

python

Edit The accepted answer works for sets that satisfy the requirements of a strict partially ordered set, so that a directed acyclic graph can be constructed:

  • irreflexivity not a < a: the list does not contain items like ['a','a']
  • transitivity if a < b and b < c then a < c: the list does not contain items like ['a','b'],['b','c'],['c','a']
  • asymmetryif a < b then not b < a: the list does not contain items like ['a','b'],['b','a']

Take this list of lists:
[['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd'], ]
and flatten it to a single list, sorted depending on a value's neighbor(s):

  • the first sublist tells you that b comes before c
  • then a before c
  • b before a
  • and finally d after c

The overall ordering between sublists is consistent, meaning there won't be sublists like these: ['b','c'],['c','b']. So the result should be: ['b', 'a', 'c', 'd']

After a (long) while I came up with this ugly mess:

def get_order(_list):
    order = _list[0]
    for sublist in _list[1:]:
        if not sublist:
            continue
        if len(sublist) == 1:
            if sublist[0] not in order:
                order.append(sublist[0])
            continue
        new_order = order.copy()
        for index, value in enumerate(sublist):
            inserted = False
            new_order_index = None
            if value in new_order:
                new_order_index = new_order.index(value)
                new_order.remove(value)
            for previous_value in sublist[:index][::-1]:
                if previous_value in new_order:
                    insert_index = new_order.index(previous_value) + 1
                    print('inserting', value, 'at position', insert_index, 'after', previous_value)
                    new_order.insert(insert_index, value)
                    inserted = True
                    break
            if inserted:
                continue
            for next_value in sublist[index:]:
                if next_value in new_order:
                    insert_index = new_order.index(next_value)
                    print('inserting', value, 'at position', insert_index, 'before', next_value)
                    new_order.insert(insert_index, value)
                    inserted = True
                    break
            if inserted:
                continue
            if new_order_index is None:
                print('appending', value)
                new_order.append(value)
            else:
                print('leaving', value, 'at position', new_order_index)
                new_order.insert(new_order_index, value)
        order = new_order
    return order

if __name__ == '__main__':
    test_list = [['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd'], ]
    order = get_order(test_list)
    #>>> inserting a at position 1 before c
    #>>> inserting c at position 2 after a
    #>>> inserting d at position 3 after c
    #>>> inserting a at position 1 before c
    #>>> inserting c at position 2 after a
    #>>> inserting b at position 0 before a
    #>>> inserting a at position 1 after b
    print(order)
    #>>> ['b', 'a', 'c', 'd']

It appears to do exactly as expected, but it is far from efficient (or elegant for that matter).
Are there any algorithms that can sort like that?
Or are there some pythonic tricks that would make this more efficient?


like image 574
CoffeeBasedLifeform Avatar asked Jul 19 '18 18:07

CoffeeBasedLifeform


People also ask

Which sorting is used for topological sorting?

Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed edge u v, vertex u comes before v in the ordering.

What is meant by topological sorting?

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering.

What is topological sort Poset?

A Total Ordering of a Poset The topological sort is a solution to scheduling problems, and it is built on the two concepts previously discussed: partial ordering and total ordering. At its core, a topological sort, or a linear extension, is a total ordering of a partially ordered set.

How does DFS and topological sort work?

Topological sort simply involves running DFS on an entire graph and adding each node to the global ordering of nodes, but only after all of a node's children are visited. This ensures that parent nodes will be ordered before their child nodes, and honors the forward direction of edges in the ordering.


2 Answers

You can create a lookup function that determines if a particular value should be placed before or after another value:

d = [['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd']]
flattened = {i for b in d for i in b}
def _lookup(a, b):
  _loc = [i for i in d if a in i and b in i]
  return True if not _loc else _loc[0].index(a) < _loc[0].index(b)

class T: 
  def __init__(self, _val):
    self.v = _val
  def __lt__(self, _n):
    return _lookup(self.v, _n.v)

final_result = [i.v for i in sorted(map(T, flattened))]

Output:

['b', 'a', 'c', 'd']

Using [['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd'], ['a', 'e']]:

['b', 'a', 'c', 'e', 'd']
like image 73
Ajax1234 Avatar answered Oct 20 '22 19:10

Ajax1234


The existing answers by nosklo and Ajax1234 both fail on an input of [[1, 3], [3, 5], [5, 2], [2, 4]]. The attempt in your question fails on an input of [[1, 4], [2, 3], [3, 4], [1, 2]].

The correct approach is as described by BowlingHawk95: perform a topological sort on the directed acyclic graph induced by your input list.

We could implement our own topological sort, but it's safer to let an existing graph library handle it. For example, NetworkX:

from itertools import chain, tee

import networkx
import networkx.algorithms

# pairwise recipe from the itertools docs.
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

def merge_ordering(sublists):
    # Make an iterator of graph edges for the new graph. Some edges may be repeated.
    # That's fine. NetworkX will ignore duplicates.
    edges = chain.from_iterable(map(pairwise, sublists))

    graph = networkx.DiGraph(edges)
    return list(networkx.algorithms.topological_sort(graph))

This produces correct output for the input in the question, the [[1, 3], [3, 5], [5, 2], [2, 4]] case the other answers failed on, and the [[1, 4], [2, 3], [3, 4], [1, 2]] case your attempt failed on:

>>> merge_ordering([[1, 3], [3, 5], [5, 2], [2, 4]])
[1, 3, 5, 2, 4]
>>> merge_ordering([['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd']])
['b', 'a', 'c', 'd']
>>> merge_ordering([[1, 4], [2, 3], [3, 4], [1, 2]])
[1, 2, 3, 4]

We can also write a version that raises an error if the input list doesn't uniquely determine the flattened form:

def merge_ordering_unique(sublists):
    # Make an iterator of graph edges for the new graph. Some edges may be repeated.
    # That's fine. NetworkX will ignore duplicates.
    edges = chain.from_iterable(map(pairwise, sublists))

    graph = networkx.DiGraph(edges)
    merged = list(networkx.algorithms.topological_sort(graph))

    for a, b in pairwise(merged):
        if not graph.has_edge(a, b):
            raise ValueError('Input has multiple possible topological orderings.')

    return merged

Demo:

>>> merge_ordering_unique([['b', 'c'], ['a', 'c'], ['b', 'a'], ['a', 'c', 'd']])
['b', 'a', 'c', 'd']
>>> merge_ordering_unique([[1, 3, 4], [1, 2, 4]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 11, in merge_ordering_unique
ValueError: Input has multiple possible topological orderings.
like image 20
user2357112 supports Monica Avatar answered Oct 20 '22 17:10

user2357112 supports Monica