Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sorting based on previous sorts

I'm trying to sort a list of IDs based on a "sorting map" which is an array of (ID1, ID2, timestamp) tuples that determines which IDs should be sorted before other IDs. Here are the rules:

  • ID1 should be sorted before ID2.
  • the timestamp can be used to break ties with newer timestamps beating older timestamps. e.g. given sorting key (C, A, 1/1/1900), (C, B, 1/1/2000) then B is sorted before A.
  • there can be cycles e.g. (A, B, 1/1/1950), (B, C, 1/1/1980), (C, A, 1/1/1900). The timestamp can can be used to break cycles, with older-timestamped records in a cycle being removed from the sorting map until the cycle is gone
  • if an ID is not present in the sorting map, it's sorted after any ID that is present in the sorting map

Example: given the sorting map (C, A, 1/1/1900), (C, B, 1/1/2000), and the list (A, B, C, D) to sort, the sorted output would be (C, B, A, D).

I'm stumped by turning these rules into an algorithm. Here's what I have so far:

  1. Fetch the most current sorting map from the database. I'll get at most one record for every unique pair of IDs.

  2. Remove cycles from the sorting map. How? Or is it easier to simply ignore cycles as part of step 4?

  3. Transform the sorting map in memory for optimal performance. For example, build a hashtable whose key is each unique ID in the sorting map so I can quickly find all sorting map rows that contain a particular ID.

  4. Sort my array of IDs using a generic binary sorting library using a custom comparison function that accepts any two IDs ID1 and ID2 parameters. The comparison function:

    a. Look up all sorting map entries containing ID1 or ID2 using the hashtable from step #3.

    b. If I already have a record containing both ID1 and ID2 in the sorting map, stop-- we know which one should be first!

    c. If neither ID1 nor ID2 are found in the sorting map, then it's a tie. Return a deterministically arbitrary result (e.g. lower ID wins).

    d. If one ID is in the sorting map but the other isn't, stop. The found one should be sorted first.

    e. If we get to here, both IDs are in the sorting map but there is no direct comparison available in the sorting map. Now what?

Performance is not a big concern because the maximum size of the sorting map is under 20K rows and the maximum number of IDs being sorted is under 30.

Got ideas?

FWIW, we'll be using .NET's List<T>.Sort(Comparison<T>) to do the sorting in C#, but the underlying algorithm is obviously language- and platform-agnostic.


If you're curious, here's the real-world need for this algorithm:

Our company builds mobile apps for delivery drivers who every day visit about 20 locations out of a territory of 100-150 total locations they are responsible for. The list of locations each day is dynamically assigned based on inventory of each location. Locations which have low inventory get a delivery of new inventory, while locations that still have enough inventory are not visited.

Drivers are free to visit locations in any order, but they usually take similar routes every day (e.g. visit locations in the South part of town when traffic is light in the morning, then visit locations in the North part of town when traffic is heavier down South).

We chose not to use 3rd-party routing software which automatically determines the most efficient driving route. Instead we've found it's better to let the driver choose the route because routing software has a hard time with constraints like "that building's loading dock is usually only free before 7AM" or "the guy who needs to sign the delivery receipt leaves early on Fridays" that have a big impact on delivery schedules.

Anyway, we'd like to use the driver's historical choices to sort each day's itinerary in the same order that the driver visited the same locations last time. This will give the driver a nicely arranged itinerary each day that matches his preferences, without him having to manually rearrange the schedule except in unusual cases. This will save the driver a minute or two each day, which adds up over time.

Each historical itinerary is really a list like this (ID1, ID2, ID3, ..., IDN, timestamp) but as an alternative to storing hundreds of past schedules I was thinking it'd be easier to decompose each N-machine historical itinerary into pairs of machines. This means I have to store, at most, N*N-1 tuples because newer orderings always kick older ones out of the sorting map. If this is a bad simplification, let me know. ;-)

like image 250
Justin Grant Avatar asked Aug 31 '12 05:08

Justin Grant


People also ask

How do you sort a list on basis of another list?

Approach : Zip the two lists. Create a new, sorted list based on the zip using sorted(). Using a list comprehension extract the first elements of each pair from the sorted, zipped list.

How do you sort data from oldest to newest?

Click Home tab > arrow under Sort & Filter, and then click Sort Oldest to Newest, or Sort Newest to Oldest.

How do I sort list without changing original?

To get a copy of the sorted list without modifying the original list, you can use the sorted() function, which returns a new sorted list. To sort the list in reverse order, you can use the reverse parameter for the sort(reverse=True) method.


1 Answers

What you are looking for is called a Topological sorting. Using that search term you can probably find very good resources.

There is one complication in your specific domain: Cycles (because drivers have behaved inconsistently over time). You're right with the fact that you need to break dependency cycles because otherwise the topological sort would fail.

You also need to break all cycles of length bigger than two.

Let's treat you ID-map as a graph: IDs (places) are nodes. Entries in your map are edges (from place ID1 to place ID2). A simple way to do that would be this:

while true
 allCycles = getListOfAllCycles();
 if (allCycles.length == 0) break;
 breakNode = chooseBreakNode(allCycles); //defined later
 deleteBreakNodeFrom(allCycles);

chooseBreakNode:
 chose the node which has been driven to the least //node is not important
 if ambiguous: chose the node in the dependency graph which is present in the highest number of cycles //breaks multiple cycles at once
 if ambiguous: chose the node which is in the longest cycle
 if ambiguous: pick an arbitrary node

Probably I didn't get chooseBreakNode quite right. It is a heuristic that you can tune to your needs.

like image 50
usr Avatar answered Sep 28 '22 06:09

usr