Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine prefix from a set of (similar) strings

I have a set of strings, e.g.

my_prefix_what_ever my_prefix_what_so_ever my_prefix_doesnt_matter 

I simply want to find the longest common portion of these strings, here the prefix. In the above the result should be

my_prefix_ 

The strings

my_prefix_what_ever my_prefix_what_so_ever my_doesnt_matter 

should result in the prefix

my_ 

Is there a relatively painless way in Python to determine the prefix (without having to iterate over each character manually)?

PS: I'm using Python 2.6.3.

like image 683
Kawu Avatar asked Jul 16 '11 15:07

Kawu


People also ask

How do you find the prefix of a string?

To check if a String str starts with a specific prefix string prefix in Java, use String. startsWith() method. Call startsWith() method on the string str and pass the prefix string prefix as argument. If str starts with the prefix string prefix , then startsWith() method returns true.

What is common prefix string?

The longest common prefix for an array of strings is the common prefix between 2 most dissimilar strings. For example, in the given array {“apple”, “ape”, “zebra”}, there is no common prefix because the 2 most dissimilar strings of the array “ape” and “zebra” do not share any starting characters.

How do you find the longest common prefix between two strings?

Given the array of strings S[], you need to find the longest string S which is the prefix of ALL the strings in the array. Longest common prefix (LCP) for a pair of strings S1 and S2 is the longest string S which is the prefix of both S1 and S2. For Example: longest common prefix of “abcdefgh” and “abcefgh” is “ABC”.


1 Answers

Never rewrite what is provided to you: os.path.commonprefix does exactly this:

Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.

For comparison to the other answers, here's the code:

# Return the longest prefix of all list elements. def commonprefix(m):     "Given a list of pathnames, returns the longest common leading component"     if not m: return ''     s1 = min(m)     s2 = max(m)     for i, c in enumerate(s1):         if c != s2[i]:             return s1[:i]     return s1 
like image 173
Ned Batchelder Avatar answered Sep 19 '22 07:09

Ned Batchelder