I'm trying to work out if my problem is solvable using the builtin sorted() function or if I need to do myself - old school using cmp would have been relatively easy.
My data-set looks like:
x = [ ('business', Set('fleet','address')) ('device', Set('business','model','status','pack')) ('txn', Set('device','business','operator')) ....
The sort rule should be basically for all value of N & Y where Y > N, x[N][0] not in x[Y][1]
Although I'm using Python 2.6 where the cmp argument is still available I'm trying to make this Python 3 safe.
So, can this be done using some lambda magic and the key argument?
-== UPDATE ==-
Thanks Eli & Winston! I didn't really think using key would work, or if it could I suspected it would be a shoe horn solution which isn't ideal.
Because my problem was for database table dependencies I had to make a minor addition to Eli's code to remove an item from its list of dependencies (in a well designed database this wouldn't happen, but who lives in that magical perfect world?)
My Solution:
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, set(names of dependancies))`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source]
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
if deps:
next_pending.append(entry)
else:
yield name
emitted.append(name) # <-- not required, but preserves original order
next_emitted.append(name)
if not next_emitted:
raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
pending = next_pending
emitted = next_emitted
This works because tuples (and lists) are compared by looking at each element in turn until a difference is found. So when you want to sort on multiple criteria in order of preference, just stick each criterion into an element of a tuple, in the same order.
Python list sort() function can be used to sort a List in ascending, descending, or user-defined order. In each case, the time complexity is O(nlogn) in Python.
In Python, there are two ways, sort() and sorted() , to sort lists ( list ) in ascending or descending order. If you want to sort strings ( str ) or tuples ( tuple ), use sorted() . This article describes the following contents. If you want to reverse or shuffle elements randomly, see the following articles.
The sort() method is a built-in Python method that, by default, sorts the list in ascending order. However, you can modify the order from ascending to descending by specifying the sorting criteria.
What you want is called a topological sort. While it's possible to implement using the builtin sort()
, it's rather awkward, and it's better to implement a topological sort directly in python.
Why is it going to be awkward? If you study the two algorithms on the wiki page, they both rely on a running set of "marked nodes", a concept that's hard to contort into a form sort()
can use, since key=xxx
(or even cmp=xxx
) works best with stateless comparison functions, particularly because timsort doesn't guarantee the order the elements will be examined in. I'm (pretty) sure any solution which does use sort()
is going to end up redundantly calculating some information for each call to the key/cmp function, in order to get around the statelessness issue.
The following is the alg I've been using (to sort some javascript library dependancies):
edit: reworked this greatly, based on Winston Ewert's solution
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, [list of dependancies])`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(emitted) # remove deps we emitted last pass
if deps: # still has deps? recheck during next pass
next_pending.append(entry)
else: # no more deps? time to emit
yield name
emitted.append(name) # <-- not required, but helps preserve original ordering
next_emitted.append(name) # remember what we emitted for difference_update() in next pass
if not next_emitted: # all entries have unmet deps, one of two things is wrong...
raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
pending = next_pending
emitted = next_emitted
Sidenote: it is possible to shoe-horn a cmp()
function into key=xxx
, as outlined in this python bug tracker message.
I do a topological sort something like this:
def topological_sort(items):
provided = set()
while items:
remaining_items = []
emitted = False
for item, dependencies in items:
if dependencies.issubset(provided):
yield item
provided.add(item)
emitted = True
else:
remaining_items.append( (item, dependencies) )
if not emitted:
raise TopologicalSortFailure()
items = remaining_items
I think its a little more straightforward than Eli's version, I don't know about efficiency.
Over looking bad formatting and this strange Set
type... (I've kept them as tuples and delimited the list items correctly...) ... and using the networkx
library to make things convenient...
x = [
('business', ('fleet','address')),
('device', ('business','model','status','pack')),
('txn', ('device','business','operator'))
]
import networkx as nx
g = nx.DiGraph()
for key, vals in x:
for val in vals:
g.add_edge(key, val)
print nx.topological_sort(g)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With