I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem: <ul> <li>using suffix trees</li> <li>using dynamic programming.</li> </ul> Method implemented is not important. It is important it can be used for a set of strings (not only two strings).

These paired functions will find the longest common string in any arbitrary array of strings: <pre class="prettyprint"><code>def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and is_substr(data[0][i:i+j], data): substr = data[0][i:i+j] return substr def is_substr(find, data): if len(data) < 1 and len(find) < 1: return False for i in range(len(data)): if find not in data[i]: return False return True print long_substr(['Oh, hello, my friend.', 'I prefer Jelly Belly beans.', 'When hell freezes over!']) </code></pre> No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job. EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm. <pre class="prettyprint"><code>def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and all(data[0][i:i+j] in x for x in data): substr = data[0][i:i+j] return substr </code></pre> Hope this helps, Jason.

Longest common substring from more than two strings

2 Answers

These paired functions will find the longest common string in any arbitrary array of strings:

def long_substr(data):     substr = ''     if len(data) > 1 and len(data[0]) > 0:         for i in range(len(data[0])):             for j in range(len(data[0])-i+1):                 if j > len(substr) and is_substr(data[0][i:i+j], data):                     substr = data[0][i:i+j]     return substr  def is_substr(find, data):     if len(data) < 1 and len(find) < 1:         return False     for i in range(len(data)):         if find not in data[i]:             return False     return True   print long_substr(['Oh, hello, my friend.',                    'I prefer Jelly Belly beans.',                    'When hell freezes over!'])

No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.

EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.

def long_substr(data):     substr = ''     if len(data) > 1 and len(data[0]) > 0:         for i in range(len(data[0])):             for j in range(len(data[0])-i+1):                 if j > len(substr) and all(data[0][i:i+j] in x for x in data):                     substr = data[0][i:i+j]     return substr

Hope this helps,

Jason.

answered Oct 17 '22 09:10

jtjacques

This can be done shorter:

def long_substr(data):   substrs = lambda x: {x[i:i+j] for i in range(len(x)) for j in range(len(x) - i + 1)}   s = substrs(data[0])   for val in data[1:]:     s.intersection_update(substrs(val))   return max(s, key=len)

set's are (probably) implemented as hash-maps, which makes this a bit inefficient. If you (1) implement a set datatype as a trie and (2) just store the postfixes in the trie and then force each node to be an endpoint (this would be the equivalent of adding all substrings), THEN in theory I would guess this baby is pretty memory efficient, especially since intersections of tries are super-easy.

Nevertheless, this is short and premature optimization is the root of a significant amount of wasted time.

answered Oct 17 '22 08:10

Herbert

Related questions
                            
                                How to switch between python 2.7 to python 3 from command line?
                            
                                Invalid control character with Python json.loads
                            
                                python ignore certificate validation urllib2
                            
                                writing to existing workbook using xlwt [closed]
                            
                                Use Django ORM as standalone [duplicate]
                            
                                ImportError: No module named Image [duplicate]
                            
                                What is y axis in seaborn distplot?
                            
                                Serialize Python dictionary to XML [closed]
                            
                                How python deals with module and package having the same name?
                            
                                How to change json encoding behaviour for serializable python object?
                            
                                How to assume an AWS role from another AWS role?
                            
                                For what purpose Django is used for? [closed]
                            
                                Parsing XML in Python using ElementTree example
                            
                                pip/python: normal site-packages is not writeable
                            
                                Python 3 Simple HTTPS server [closed]
                            
                                Using Scrapy with authenticated (logged in) user session
                            
                                Python, Deleting all files in a folder older than X days
                            
                                Python SOAP Client - use SUDS or something else?
                            
                                python.array versus numpy.array
                            
                                *args, **kwargs in jinja2 macros

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Longest common substring from more than two strings

Tags:

python

string

longest-substring

Nicolas NOEL

People also ask

2 Answers

jtjacques

Herbert

Recent Activity

Donate For Us