Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest common prefix using buffer?

If I have an input string and an array:

s = "to_be_or_not_to_be" 
pos = [15, 2, 8]

I am trying to find the longest common prefix between the consecutive elements of the array pos referencing the original s. I am trying to get the following output:

longest = [3,1]

The way I obtained this is by computing the longest common prefix of the following pairs:

  • s[15:] which is _be and s[2:] which is _be_or_not_to_be giving 3 ( _be )
  • s[2:] which is _be_or_not_to_be and s[8:] which is _not_to_be giving 1 ( _ )

However, if s is huge, I don't want to create multiple copies when I do something like s[x:]. After hours of searching, I found the function buffer that maintains only one copy of the input string but I wasn't sure what is the most efficient way to utilize it here in this context. Any suggestions on how to achieve this?

like image 698
Legend Avatar asked Nov 10 '11 00:11

Legend


2 Answers

Here is a method without buffer which doesn't copy, as it only looks at one character at a time:

from itertools import islice, izip

s = "to_be_or_not_to_be"
pos = [15, 2, 8]


length = len(s)    

for start1, start2 in izip(pos, islice(pos, 1, None)):
    pref = 0
    for pos1, pos2 in izip(xrange(start1, length), xrange(start2, length)):
        if s[pos1] == s[pos2]:
            pref += 1
        else:
            break
    print pref
# prints 3 1

I use islice, izip, and xrange in case you're talking about potentially very long strings.

I also couldn't resist this "One Liner" which doesn't even require any indexing:

[next((i for i, (a, b) in 
    enumerate(izip(islice(s, start1, None), islice(s, start2, None))) 
        if a != b), 
    length - max((start1, start2))) 
 for start1, start2 in izip(pos, islice(pos, 1, None))]

One final method, using os.path.commonprefix:

[len(commonprefix((buffer(s, n), buffer(s, m)))) for n, m in zip(pos, pos[1:])]
like image 188
agf Avatar answered Oct 02 '22 13:10

agf


>>> import os
>>> os.path.commonprefix([s[i:] for i in pos])
'_'

Let Python to manage memory for you. Don't optimize prematurely.

To get the exact output you could do (as @agf suggested):

print [len(commonprefix([buffer(s, i) for i in adj_indexes]))
       for adj_indexes in zip(pos, pos[1:])]
# -> [3, 1]
like image 33
jfs Avatar answered Oct 02 '22 14:10

jfs