Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split strings using regular expressions

Tags:

string

c#

regex

I want to split a string into a list or array.

Input: green,"yellow,green",white,orange,"blue,black"

The split character is the comma (,), but it must ignore commas inside quotes.

The output should be:

  • green
  • yellow,green
  • white
  • orange
  • blue,black

Thanks.

like image 440
hui Avatar asked Nov 14 '11 15:11

hui


People also ask

Can you use a regex in Split?

You do not only have to use literal strings for splitting strings into an array with the split method. You can use regex as breakpoints that match more characters for splitting a string.

How do you split a string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

What does the string split regex method do?

Split(String) Splits an input string into an array of substrings at the positions defined by a regular expression pattern specified in the Regex constructor.


1 Answers

Actually this is easy enough to just use match :

string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
    Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success)
    {
        Console.WriteLine("{0}", matchResults.Value);
        // matched text: matchResults.Value
        // match start: matchResults.Index
        // match length: matchResults.Length
        matchResults = matchResults.NextMatch();
    }
}

Output :

green
yellow,green
white
orange
blue,black

Explanation :

@"
             # Match either the regular expression below (attempting the next alternative only if this one fails)
   (?<=         # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
      ""            # Match the character “""” literally
   )
   \b           # Assert position at a word boundary
   [a-z,]       # Match a single character present in the list below
                   # A character in the range between “a” and “z”
                   # The character “,”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \b           # Assert position at a word boundary
   (?=          # Assert that the regex below can be matched, starting at this position (positive lookahead)
      ""            # Match the character “""” literally
   )
|            # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
   [a-z]        # Match a single character in the range between “a” and “z”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
like image 55
FailedDev Avatar answered Sep 21 '22 05:09

FailedDev