I'm trying to sort a list of IDs based on a "sorting map" which is an array of (ID1, ID2, timestamp)
tuples that determines which IDs should be sorted before other IDs. Here are the rules:
ID1
should be sorted before ID2
.(C, A, 1/1/1900), (C, B, 1/1/2000)
then B
is sorted before A
. (A, B, 1/1/1950), (B, C, 1/1/1980), (C, A, 1/1/1900)
. The timestamp can can be used to break cycles, with older-timestamped records in a cycle being removed from the sorting map until the cycle is goneExample: given the sorting map (C, A, 1/1/1900), (C, B, 1/1/2000)
, and the list (A, B, C, D)
to sort, the sorted output would be (C, B, A, D)
.
I'm stumped by turning these rules into an algorithm. Here's what I have so far:
Fetch the most current sorting map from the database. I'll get at most one record for every unique pair of IDs.
Remove cycles from the sorting map. How? Or is it easier to simply ignore cycles as part of step 4?
Transform the sorting map in memory for optimal performance. For example, build a hashtable whose key is each unique ID in the sorting map so I can quickly find all sorting map rows that contain a particular ID.
Sort my array of IDs using a generic binary sorting library using a custom comparison function that accepts any two IDs ID1
and ID2
parameters. The comparison function:
a. Look up all sorting map entries containing ID1
or ID2
using the hashtable from step #3.
b. If I already have a record containing both ID1
and ID2
in the sorting map, stop-- we know which one should be first!
c. If neither ID1 nor ID2 are found in the sorting map, then it's a tie. Return a deterministically arbitrary result (e.g. lower ID wins).
d. If one ID is in the sorting map but the other isn't, stop. The found one should be sorted first.
e. If we get to here, both IDs are in the sorting map but there is no direct comparison available in the sorting map. Now what?
Performance is not a big concern because the maximum size of the sorting map is under 20K rows and the maximum number of IDs being sorted is under 30.
Got ideas?
FWIW, we'll be using .NET's List<T>.Sort(Comparison<T>)
to do the sorting in C#, but the underlying algorithm is obviously language- and platform-agnostic.
If you're curious, here's the real-world need for this algorithm:
Our company builds mobile apps for delivery drivers who every day visit about 20 locations out of a territory of 100-150 total locations they are responsible for. The list of locations each day is dynamically assigned based on inventory of each location. Locations which have low inventory get a delivery of new inventory, while locations that still have enough inventory are not visited.
Drivers are free to visit locations in any order, but they usually take similar routes every day (e.g. visit locations in the South part of town when traffic is light in the morning, then visit locations in the North part of town when traffic is heavier down South).
We chose not to use 3rd-party routing software which automatically determines the most efficient driving route. Instead we've found it's better to let the driver choose the route because routing software has a hard time with constraints like "that building's loading dock is usually only free before 7AM" or "the guy who needs to sign the delivery receipt leaves early on Fridays" that have a big impact on delivery schedules.
Anyway, we'd like to use the driver's historical choices to sort each day's itinerary in the same order that the driver visited the same locations last time. This will give the driver a nicely arranged itinerary each day that matches his preferences, without him having to manually rearrange the schedule except in unusual cases. This will save the driver a minute or two each day, which adds up over time.
Each historical itinerary is really a list like this (ID1, ID2, ID3, ..., IDN, timestamp) but as an alternative to storing hundreds of past schedules I was thinking it'd be easier to decompose each N-machine historical itinerary into pairs of machines. This means I have to store, at most, N*N-1 tuples because newer orderings always kick older ones out of the sorting map. If this is a bad simplification, let me know. ;-)
Approach : Zip the two lists. Create a new, sorted list based on the zip using sorted(). Using a list comprehension extract the first elements of each pair from the sorted, zipped list.
Click Home tab > arrow under Sort & Filter, and then click Sort Oldest to Newest, or Sort Newest to Oldest.
To get a copy of the sorted list without modifying the original list, you can use the sorted() function, which returns a new sorted list. To sort the list in reverse order, you can use the reverse parameter for the sort(reverse=True) method.
What you are looking for is called a Topological sorting. Using that search term you can probably find very good resources.
There is one complication in your specific domain: Cycles (because drivers have behaved inconsistently over time). You're right with the fact that you need to break dependency cycles because otherwise the topological sort would fail.
You also need to break all cycles of length bigger than two.
Let's treat you ID-map as a graph: IDs (places) are nodes. Entries in your map are edges (from place ID1 to place ID2). A simple way to do that would be this:
while true
allCycles = getListOfAllCycles();
if (allCycles.length == 0) break;
breakNode = chooseBreakNode(allCycles); //defined later
deleteBreakNodeFrom(allCycles);
chooseBreakNode:
chose the node which has been driven to the least //node is not important
if ambiguous: chose the node in the dependency graph which is present in the highest number of cycles //breaks multiple cycles at once
if ambiguous: chose the node which is in the longest cycle
if ambiguous: pick an arbitrary node
Probably I didn't get chooseBreakNode
quite right. It is a heuristic that you can tune to your needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With