Is it possible to sort in python 3 using buffer-like (pointer-based) string comparisons?

Tags:

Consider the problem of sorting all the suffixes of a string, where a suffix is the substring from some index i to the end of the string. Instead of creating a list of the sorted suffixes, we can create a list of the indices corresponding to the starting points of the sorted suffixes. Then we can do something like this:

Click to copy

text = ... some text string ...
sortedIndices = sorted([i for i in range(len(text))], 
                       key = lambda i: text[i:])

This works for short strings, but if the string is sufficiently long, we'll run out of memory because the key function results in a copy of the suffix, and all the keys are generated at the outset. In python 2.7 there's a slick way around this, namely, the buffer() function:

Click to copy

sortedIndices = sorted([i for i in range(len(text))], 
                       key = lambda i: buffer(text, i))

In this case, the key is just a pointer into the text string, so the total memory needed is much less (O(n) vs O(n*n)). Hence, it will work with much longer strings. This works beautifully in 2.7, but in 3.x the buffer() function has been removed in favor of memoryview, which unlike buffer doesn't -- AFAIK -- support pointer-based string comparisons (i.e., without using the tobytes method, which creates a copy of the string). My question is: Is there any way to do something similar in python 3.x?

441

asked Jan 22 '14 01:01

user3065699

1 Answers

It looks to me like memoryview doesn't do that. That might actually be a good thing.

You can still do this with a class, which is more object oriented anyway:

Click to copy

#!/usr/local/cpython-3.3/bin/python

import sys
import functools

@functools.total_ordering
class Suffix_comparison:
    def __init__(self, string, starting_position):
        self.string = string
        self.starting_position = starting_position

    def __lt__(self, other):
        if self.string[self.starting_position:] < other.string[other.starting_position]:
            return True
        else:
            return False

    def __eq__(self, other):
        if self.string[self.starting_position:] == other.string[other.starting_position]:
            return True
        else:
            return False

    def __str__(self):
        return self.string

    __repr__ = __str__

def main():
    list_ = []
    for line in sys.stdin:
        stripped_line = line.rstrip('\n')
        list_.append(Suffix_comparison(stripped_line, 5))

    list_.sort()

    for line in list_:
        print(line)

main()

189

answered Nov 14 '22 22:11

dstromberg

Related questions
                            
                                tkinter - screen width and height of secondary display?
                            
                                In Travis CI, can I run a script after all of my build matrix has succeeded?
                            
                                Median filter of masked arrays
                            
                                Why does sum() operation on numpy masked_array change fill value to 1e20?
                            
                                How to make PyCharm 3.0 to add space after comment block or line
                            
                                display cv2.VideoCapture image inside Pygame surface
                            
                                Django REST Framework PATCH fails on required fields
                            
                                Pylint error W0232: class has no __init__ method
                            
                                Pandas: Use iterrows on Dataframe subset
                            
                                How to call a function whenever a key is pressed in python
                            
                                Hidden Markov in PyMC3
                            
                                Splitting sentences with nltk while preserving quotes
                            
                                In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level
                            
                                How does the select() function in the select module of Python exactly work?
                            
                                Why can `__subclasshook__` be monkeypatched onto the metaclass but `__instancecheck__` cannot?
                            
                                64 bit python fills up memory until computer freezes with no memerror
                            
                                How to color parts of links in dendrograms using scipy in python?
                            
                                Mouse Events of WxPython TaskBarIcon on Mac OSX are not triggering
                            
                                PyInstaller: “ImportError: No module named htmlentitydefs”
                            
                                Python autocomplete user input [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to sort in python 3 using buffer-like (pointer-based) string comparisons?

Tags:

python

python-3.x

user3065699

People also ask

1 Answers

dstromberg

Recent Activity

Donate For Us