Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String algorithm suggestion to find all the common prefixes of a list of strings

What algorithm would you suggest to find out the longest common prefixes of a list of strings?

I might have strings such as:

Call Mike and schedule meeting.
Call Lisa
Call Adam and ask for quote.
Implement new class for iPhone project
Implement new class for Rails controller
Buy groceries

I want to find out the following prefixes:

"Call "
"Implement new class "

I'll be using Objective C, so a ready made cocoa solution would be a plus (though not a must).

like image 372
cfischer Avatar asked Jul 09 '11 11:07

cfischer


2 Answers

Edit: for the clarified question:

  1. Sort the strings
  2. Find the longest common prefix of each adjacent pair
  3. Sort and dedupe the common prefixes, then remove any that's a strict prefix of another.

Actually, step (3) only requires that you remove any that's a dupe/prefix of another, which you could do with a trie or whatever instead of sorting. In fact it may be that the whole thing can be done faster with a suitably annotated trie - if you include a "count" at each node then you're looking precisely for nodes with a count of 2+, that have no children with a count of 2+.

But sorting is built in, and once you've sorted you can detect prefixes by looking at adjacent items, so it's probably less effort.

[Original answer:

Just a one-off operation, find the longest common prefix between all the strings?

I'd probably do it in terms of the length of the prefix. In pseudo-code, and assuming nul-terminated strings:

prefixlen = strlen(first_string);
foreach string in the list {
    for (i = 0; i < prefixlen; ++i) {
        if (string[i] != first_string[i]) {
            prefixlen = i;
            break;
        }
    }
    if (prefixlen == 0) break;
}

common_prefix = substring(firststring, 0, prefixlen);

]

like image 147
Steve Jessop Avatar answered Sep 28 '22 05:09

Steve Jessop


You could insert all your strings into a trie (aka prefix tree). Then traverse the trie from the root until you find a node with more than one child (or just stop inserting strings when you would have to append a second child to a node).

like image 26
omz Avatar answered Sep 28 '22 04:09

omz