I've got a collection of strings, and I need to know the first index where they all differ. I can think of two ways to do this: (the following pseudo code is just off the top of my head and may be heavily bug-laden)
First Way:
var minLength = [go through all strings finding min length];
var set = new set()
for(i=0;i<minlength;i++)
{
for(str in strings)
{
var substring = str.substring(0,i);
if(set.contains(substring))
break; // not all different yet, increment i
set.add(substring)
}
set.clear(); // prepare for next length of substring
}
This strikes me as gross because of the use of a set data structure where it seems like one should not be needed.
Second Way:
var minLength = [go through all strings finding min length];
strings.sort();
for(i=0;i<minlength;i++)
{
boolean done = true;
char last = null;
for(str in strings)
{
char c = str[i];
if(c == last)
{
// not all different yet, increment i
done = false;
break;
}
last = c;
}
if(done)
return i;
}
But it annoys me that I have to run the sort first, because the sorting algorithm, by its very nature, has access to the information that I'm looking for.
Surely there must be a more efficient way than what I have listed above. Eventually I'd like to abstract it out to any type of array, but that will be trivial and it's simpler to think of it as a string problem.
Any help?
**UPDATE: I apparently didn't explain myself very well. If my strings are ["apple", "banana", "cucumber", "banking"], I want the function to return 3, because there were two strings ("banana" and "banking") that matched through index 0, 1, and 2, so 3 is the first index where they are all unique.
As Daniel mentioned below, a better way to state my needs is that: "I want to find index i where calling substring(0,i) on all my strings will result in all unique values."**
This is untested, but here's my attempt. (I may be making it more complicated than I have to, but I think it's a different way to look at it.)
The basic idea is to compile groups of items that match at the first element, then find the max unique index for each group, checking elements at each successive index.
int FirstUniqueIndex<T>(IEnumerable<IEnumerable<T>> myArrayCollection)
{
//just an overload so you don't have to specify index 0 all the time
return FirstUniqueIndex(myArrayCollection, 0);
}
int FirstUniqueIndex<T>(IEnumerable<IEnumerable<T>> myArrayCollection, int StartIndex)
{
/* Group the current collection by the element at StartIndex, and
* return a collection of these groups. Additionally, we're only interested
* in the groups with more than one element, so only get those.*/
var groupsWithMatches = from var item in myArrayCollection //for each item in the collection (called "item")
where item.Length > StartIndex //that are long enough
group by item[StartIndex] into g //group them by the element at StartIndex, and call the group "g"
where g.Skip(1).Any() //only want groups with more than one element
select g; //add the group to the collection
/* Now "groupsWithMatches" is an enumeration of groups of inner matches of
* your original arrays. Let's process them... */
if(groupsWithMatches.Any())
//some matches were found - check the next index for each group
//(get the maximum unique index of all the matched groups)
return groupsWithMatches.Max(group => FirstUniqueIndex(group, StartIndex + 1));
else
//no matches found, all unique at this index
return StartIndex;
}
And for the non-LINQ version of the above (I'll change it to use a List collection, but any collection will do). I'll even remove the lambda. Again untested, so try not to aim sharp implements in my direction.
int FirstUniqueIndex<T>(List<List<T>> myArrayCollection, int StartIndex)
{
/* Group the current collection by the element at StartIndex, and
* return a collection of these groups. Additionally, we're only interested
* in the groups with more than one element, so only get those.*/
Dictionary<T, List<List<T>>> groupsWithMatches = new Dictionary<T, List<List<T>>>();
//group all the items by the element at StartIndex
foreach(var item in myArrayCollection)
{
if(item.Count > StartIndex)
{
List<List<T>> group;
if(!groups.TryGetValue(item[StartIndex], out group))
{
//new group, so make it first
group = new List<List<T>>();
groups.Add(item[StartIndex], group);
}
group.Add(Item);
}
}
/* Now "groups" is an enumeration of groups of inner matches of
* your original arrays. Let's get the groups with more than one item. */
List<List<List<T>>> groupsWithMatches = new List<List<List<T>>>(groups.Count);
foreach(List<List<T> group in groupsWithMatches)
{
if(group.Count > 1)
groupsWithMatches.Add(group);
}
if(groupsWithMatches.Count > 0)
{
//some matches were found - check the next index for each group
//(get the maximum unique index of all the matched groups)
int max = -1;
foreach(List<List<T>> group in groupsWithMatches)
{
int index = FirstUniqueIndex(group, StartIndex + 1);
max = index > max ? index : max;
}
return max;
}
else
{
//no matches found, all unique at this index
return StartIndex;
}
}
have you looked at a Patricia trie? (Java implementation available on google code)
![]()
Build the trie, then traverse the data structure to find the maximum string position of all the internal nodes (black dots in the function above).
This seems like it should be an O(n) operation. I'm not sure whether your set implementation is O(n) or not -- it "smells" like O(n2) but I'm not sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With