Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given an unsorted python list, how can I find the minimum set of movements required to sort it

Tags:

python

sorting

I have a list of items stored in a remote database which may be unsorted, and I want to sort them. The database accepts commands of them form:

move item1 before item2
move item3 after item2

So, given a list of the form:

[1,3,2,7,6,0,4]

...how can I get the sequence of moves:

move 2 before 3
move 7 after 6
move 0 before 1
move 4 before 6

I assume a modification of the bubblesort algorithm would work, but I'm specifically looking for the most efficient implementation that is still pythonic, and that generates the fewest move commands.

UPDATE: the list is 1000-10000 long, and all items are unique - no repeats. Only a very small number of items - 1-10 - will be in the wrong place at any given time. Time is a concern - it should take seconds, not minutes - but it does not have to be extremely fast.

UPDATE 2: I would also like to move each item only once

like image 279
Realist Avatar asked Jan 28 '14 15:01

Realist


People also ask

How do you sort an unsorted list in Python?

In Python, you can sort a list using the built-in list. sort() method or the built-in sorted() function. The sorted() function creates a new sorted list, while the list. sort() method sorts the list in place.

How do you sort list from smallest to largest in Python?

Summary. Use the Python List sort() method to sort a list in place. The sort() method sorts the string elements in alphabetical order and sorts the numeric elements from smallest to largest. Use the sort(reverse=True) to reverse the default sort order.

How do you sort a list in Python for loop?

STEP 1: Declare and initialize an array. STEP 2: Loop through the array and select an element. STEP 3: The inner loop will be used to compare the selected element from the outer loop with the rest of the elements of the array. STEP 4: If any element is less than the selected element then swap the values.


Video Answer


3 Answers

As you want to reduce the number of move sequences, the optimal approach I can think of is to use binary search on a sorted list to determine the insertion point of each element. If any of the element is already in its correct position, you need not move it.

This will generate n - d sequence moves where n is the number of elements and d is the number of elements in its correct position.

  • For an already sorted list, number of sequence moves are n - d = n - n = 0
  • For a list where all the elements are in wrong position, number of sequence moves are n - d = n - 0 = n

Implementation

def gen_move(seq):
    from bisect import bisect_left
    out = seq[0:1]
    for elem in seq[1:]:
        index = bisect_left(out, elem)
        if seq[index] != elem:
            if index == 0:
                print "Move {} before {}".format(elem, out[index])
            else:
                print "Move {} after {}".format(elem, out[index - 1])
        out.insert(index, elem)
    print out

Demo

gen_move([1,3,2,7,6,0,4])
Move 2 after 1
Move 6 after 3
Move 0 before 1
Move 4 after 3
[0, 1, 2, 3, 4, 6, 7]

gen_move(range(10)[::-1])
Move 8 before 9
Move 7 before 8
Move 6 before 7
Move 5 before 6
Move 4 before 5
Move 3 before 4
Move 2 before 3
Move 1 before 2
Move 0 before 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

gen_move(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Performance

In [5]: %timeit gen_move(range(10000, 0, -1))
10000 loops, best of 3: 84 us per loop

Time Complexity

sum(1 ln 1 + 2 ln 2 + 3 ln 3 + ..... n ln n) < O(n ln n)

Space Complexity

O(n)
like image 127
Abhijit Avatar answered Sep 20 '22 17:09

Abhijit


  1. Get data from remote database
  2. Sort them (just simple sort)
  3. Use difflib SequenceMatcher.get_opcodes to get replace/delete/insert/skip operations that transform original list to sorted list
  4. Transform these operations to "move X after/before Y" operations

I am a little bit worried about time complexity of difflib, so you should benchmark it for the expected data size. Faster alternative could be rolling-checksum algorithm (like librsync).

like image 43
Messa Avatar answered Sep 21 '22 17:09

Messa


This is pretty naive, but seems to work:

xs = [1,3,2,7,6,0,4]
ys = []

for x in xs:
    for n, y in enumerate(ys):
        if x < y:
            print x, 'before', y
            ys.insert(n, x)
            break
    else:
        ys.append(x)

Result:

2 before 3
6 before 7
0 before 1
4 before 6
like image 33
georg Avatar answered Sep 19 '22 17:09

georg