Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex split string preserving quotes

Tags:

c#

regex

split

I need to split a string like the one below, based on space as the delimiter. But any space within a quote should be preserved.

research library "not available" author:"Bernard Shaw"

to

research
library
"not available"
author:"Bernard Shaw"

I am trying to do this in C Sharp, I have this Regex: @"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*""" from another post in SO, which splits the string into

research
library
"not available"
author
"Bernard Shaw"

which unfortunately does not meet my exact requirements.

I'm looking for any Regex, that would do the trick.

Any help appreciated.

like image 919
itsbalur Avatar asked Jan 24 '11 10:01

itsbalur


People also ask

How do you split a quote in Python?

Python3. here split() method will split the string for every quotation ( " ) .

How do you remove quotes from a string in Python?

Using the strip() Function to Remove Double Quotes from String in Python. We use the strip() function in Python to delete characters from the start or end of the string. We can use this method to remove the quotes if they exist at the start or end of the string.


2 Answers

As long as there can be no escaped quoted inside quoted strings, the following should work:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

This regex splits on space characters only if they are preceded and followed by an even number of quotes.

The regex without all those escaped quotes, explained:

(?<=      # Assert that it's possible to match this before the current position (positive lookbehind):
 ^        # The start of the string
 [^"]*    # Any number of non-quote characters
 (?:      # Match the following group...
  "[^"]*  # a quote, followed by any number of non-quote characters
  "[^"]*  # the same
 )*       # ...zero or more times (so 0, 2, 4, ... quotes will match)
)         # End of lookbehind assertion.
[ ]       # Match a space
(?=       # Assert that it's possible to match this after the current position (positive lookahead):
 (?:      # Match the following group...
  [^"]*"  # see above
  [^"]*"  # see above
 )*       # ...zero or more times.
 [^"]*    # Match any number of non-quote characters
 $        # Match the end of the string
)         # End of lookahead assertion
like image 170
Tim Pietzcker Avatar answered Nov 07 '22 10:11

Tim Pietzcker


Here you go:

C#:

Regex.Matches(subject, @"([^\s]*""[^""]+""[^\s]*)|\w+")

Regular expression:

([^\s]*\"[^\"]+\"[^\s]*)|\w+
like image 42
Joel Rein Avatar answered Nov 07 '22 09:11

Joel Rein