Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding an Insertion in a String

What's the best way of checking if StringA = StringB with an another StringC inserted at some arbitrary point?

For example, given abcdef and abcXYZdef, I want to find that abcXYZdef is abcdef with XYZ inserted at position 4.

On the other hand, given abcdef and abRSTcdXYZef, I want to find that the first string cannot be turned into the second with only a single insertion.

I know I could go over StringA character by character, from both ends, and check if it covers the whole of StringB, but that would be rather tedious to write. It would also be rather slow to do this in Python (which i am working in) and I would rather not write a special C-extension just for this.

Are there any clever things I can do with Regex's or other standard string-manipulation functions that can do this for me?

edit: To clarify, StringC is completely unknown; There may not even be a valid StringC, and i will want to know if that is the case.

like image 928
Li Haoyi Avatar asked Aug 02 '11 20:08

Li Haoyi


2 Answers

A very underappreciated gem in the standard lib is difflib...

>>> import difflib
>>> s = difflib.SequenceMatcher(None, "GHSKWITNIFSI", "GHSKWAGDITNIFSI")
>>> s.get_matching_blocks()[:-1]
[(0, 0, 5), (5, 8, 7)]
>>> s = difflib.SequenceMatcher(None, "GHSKWITNIFSI", "GHSKWITNIFSI")
>>> s.get_matching_blocks()[:-1]
[(0, 0, 12)]
like image 165
pyroscope Avatar answered Sep 19 '22 01:09

pyroscope


This ... feels kludgy to a degree, and it's only probably half-way there, but it seems like it found the substring in your example and could probably be expanded a bit. I can revise it some in a minute with some more time to test, but it's an approach concept:

s1 = 'GHSKWITNIFSI'
s2 = 'GHSKWAGDITNIFSI'

l = len(s2) - len(s1)

for i in range(len(s1)):
 if s2[0:i] + s2[i + l:] == s1:
  print i
  break

I don't like the use of range(len()), but in this particular use scenario I think it's appropriate. It will print the index where an insertion took place if a single insertion will turn s1 into s2.

like image 20
g.d.d.c Avatar answered Sep 22 '22 01:09

g.d.d.c