Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need C# Regex to get pairs of words in a sentence

Tags:

c#

regex

Is there a regex that would take the following sentence:

"I want this split up into pairs"

and generate the following list:

"I want", "want this", "this split", "split up", "up into", "into pairs"

like image 744
EZE Avatar asked Jul 14 '11 14:07

EZE


1 Answers

Since words need to be re-used, you need lookahead assertions :

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (\w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
    matchResult = matchResult.NextMatch();
}

For groups of threes:

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (      # and capture...
      \w+   # another word,
      \s+   # whitespace,
      \w+   # word.
     )      # End of capturing group 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);

etc.

like image 111
Tim Pietzcker Avatar answered Nov 09 '22 14:11

Tim Pietzcker