Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching spaces, but not in "strings"

Tags:

c++

c

regex

qt

I am looking for a regular exression matching spaces only if thos spaces are not enclosed in double quotes ("). For example, in

Mary had "a little lamb"

it should match the first an the second space, but not the others.

I want to split the string only at the spaces which are not in the double quotes, and not at the quotes.

I am using C++ with the Qt toolkit and wanted to use QString::split(QRegExp). QString is very similar to std::string and QRegExp are basically POSIX regex encapsulated in a class. If there exist such a regex, the split would be trivial.

Examples:

Mary had "a little lamb"     =>   Mary,had,"a little lamb"
1" 2 "3                      =>   1" 2 "3 (no splitting at ")
abc def="g h i" "j k" = 12   =>   abc,def="g h i","j k",=,12

Sorry for the edits, I was very imprecise when I asked the question first. Hope it is somewhat more clear now.

like image 395
Gunther Piez Avatar asked Dec 07 '22 06:12

Gunther Piez


1 Answers

(I know you just posted almost exactly the same answer yourself, but I can't bear to just throw all this away. :-/)

If it's possible to solve your problem with a regex split operation, the regex will have to match even numbers of quotation marks, as MSalters said. However, a split regex should match only the spaces you're splitting on, so the rest of the work has to be done in a lookahead. Here's what I would use:

" +(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"

If the text is well formed, a lookahead for an even number of quotes is sufficient to determine that the just-matched space is not inside a quoted sequence. That is, lookbehinds aren't necessary, which is good because QRegExp doesn't seem to support them. Escaped quotes can be accommodated too, but the regex becomes quite a bit larger and uglier. But if you can't be sure the text is well formed, it's extremely unlikely you'll be able to solve your problem with split().

By the way, QRegExp does not implement POSIX regular expressions--if it did, it wouldn't support lookaheads OR lookbehinds. Instead, it falls into the loosely-defined category of Perl-compatible regex flavors.

like image 155
Alan Moore Avatar answered Dec 11 '22 10:12

Alan Moore