Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare 2 string arrays and find all consecutive matches and save indices?

Tags:

arrays

c#

For example, if I have the following 2 arrays:

string[] userSelect = new string[] {"the", "quick", "brown", "dog", "jumps", "over"};
string[] original = new string[] {"the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"};

I'm trying to compare the userSelect array against the original array and get all consecutive matches based on index. The userSelect array will always be made up of strings from the original array. So the output would be like the following:

int[] match0 = new int[] {0, 1, 2}; // indices for "the quick brown"
int[] match2 = new int[] {4, 5}; // indices for "jumps over"
int[] match1 = new int[] {3}; // index for "dog"

The userSelect array length will never exceed the original array length, however it can be shorter and the words can be in any order. How would I go about doing this?

like image 934
rotaercz Avatar asked Jun 12 '13 20:06

rotaercz


2 Answers

Here's what I came up with

var matches = 
    (from l in userSelect.Select((s, i) => new { s, i })
     join r in original.Select((s, i) => new { s, i }) 
     on l.s equals r.s 
     group l by r.i - l.i into g
     from m in g.Select((l, j) => new { l.i, j = l.i - j, k = g.Key })
     group m by new { m.j, m.k } into h
     select h.Select(t => t.i).ToArray())
    .ToArray();

This will output

matches[0] // { 0, 1, 2 } the quick brown
matches[1] // { 4, 5 } jumps over
matches[2] // { 0 } the 
matches[3] // { 3 } dog

Using the input {"the", "quick", "brown", "the", "lazy", "dog"} yields:

matches[0] // { 0, 1, 2 } the quick brown
matches[1] // { 0 } the 
matches[2] // { 3 } the
matches[3] // { 3, 4, 5 } the lazy dog

Note that the calls to ToArray are optional. If you don't actually need the results in an array you can leave that out and save a little processing time.

To filter out any sequences that are completely contained with other larger sequences, you can run this code (note the orderby in the modified query):

var matches = 
    (from l in userSelect.Select((s, i) => new { s, i })
     join r in original.Select((s, i) => new { s, i }) 
     on l.s equals r.s 
     group l by r.i - l.i into g
     from m in g.Select((l, j) => new { l.i, j = l.i - j, k = g.Key })
     group m by new { m.j, m.k } into h
     orderby h.Count() descending
     select h.Select(t => t.i).ToArray());

int take = 0;
var filtered = matches.Where(m => !matches.Take(take++)
                                          .Any(n => m.All(i => n.Contains(i))))
    .ToArray();
like image 118
p.s.w.g Avatar answered Nov 23 '22 08:11

p.s.w.g


This would be easier if words couldn't be repeated . . .

The general idea is to create a Dictionary<string, List<int>> from the original word list. That will tell you which words are used at what positions. The dictionary for your sample would be:

key="the", value={0, 6}
key="quick", value={1}
key="brown", value={2}
... etc

Now, when you get the user's input, you step through it sequentially, looking up the words in your dictionary to get the list of positions.

So you look up a word and it's in the dictionary. You save the position(s) returned from the dictionary. Look up the next word. There are three conditions you need to handle:

  1. The word isn't in the dictionary. Save your previous consecutive grouping and go to the next word, where you'll potentially start a new group.
  2. The word is in the dictionary, but the none of the positions returned match the expected positions (the expected positions being one more than the saved positions from the last word). Save your previous consecutive group and go to the next word, where you'll potentially start a new group.
  3. The word is in the dictionary and one of the returned positions matches the expected position. Save those positions and go to the next word.

I hope you get the idea.

like image 31
Jim Mischel Avatar answered Nov 23 '22 08:11

Jim Mischel