Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to sort in python 3 using buffer-like (pointer-based) string comparisons?

Consider the problem of sorting all the suffixes of a string, where a suffix is the substring from some index i to the end of the string. Instead of creating a list of the sorted suffixes, we can create a list of the indices corresponding to the starting points of the sorted suffixes. Then we can do something like this:

text = ... some text string ...
sortedIndices = sorted([i for i in range(len(text))], 
                       key = lambda i: text[i:])  

This works for short strings, but if the string is sufficiently long, we'll run out of memory because the key function results in a copy of the suffix, and all the keys are generated at the outset. In python 2.7 there's a slick way around this, namely, the buffer() function:

sortedIndices = sorted([i for i in range(len(text))], 
                       key = lambda i: buffer(text, i))  

In this case, the key is just a pointer into the text string, so the total memory needed is much less (O(n) vs O(n*n)). Hence, it will work with much longer strings. This works beautifully in 2.7, but in 3.x the buffer() function has been removed in favor of memoryview, which unlike buffer doesn't -- AFAIK -- support pointer-based string comparisons (i.e., without using the tobytes method, which creates a copy of the string). My question is: Is there any way to do something similar in python 3.x?

like image 441
user3065699 Avatar asked Jan 22 '14 01:01

user3065699


People also ask

Can you use sort on a string in Python?

In Python, there are two ways, sort() and sorted() , to sort lists ( list ) in ascending or descending order. If you want to sort strings ( str ) or tuples ( tuple ), use sorted() .

How does buffer work in Python?

Buffer structures (or simply “buffers”) are useful as a way to expose the binary data from another object to the Python programmer. They can also be used as a zero-copy slicing mechanism. Using their ability to reference a block of memory, it is possible to expose any data to the Python programmer quite easily.

How do you sort a string without using sort in Python?

You can use Nested for loop with if statement to get the sort a list in Python without sort function. This is not the only way to do it, you can use your own logic to get it done.


1 Answers

It looks to me like memoryview doesn't do that. That might actually be a good thing.

You can still do this with a class, which is more object oriented anyway:

#!/usr/local/cpython-3.3/bin/python

import sys
import functools

@functools.total_ordering
class Suffix_comparison:
    def __init__(self, string, starting_position):
        self.string = string
        self.starting_position = starting_position

    def __lt__(self, other):
        if self.string[self.starting_position:] < other.string[other.starting_position]:
            return True
        else:
            return False

    def __eq__(self, other):
        if self.string[self.starting_position:] == other.string[other.starting_position]:
            return True
        else:
            return False

    def __str__(self):
        return self.string

    __repr__ = __str__

def main():
    list_ = []
    for line in sys.stdin:
        stripped_line = line.rstrip('\n')
        list_.append(Suffix_comparison(stripped_line, 5))

    list_.sort()

    for line in list_:
        print(line)

main()
like image 189
dstromberg Avatar answered Nov 14 '22 22:11

dstromberg