Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to select semicolons that are not enclosed in double quotes

Tags:

regex

I have string like

a;b;"aaa;;;bccc";deef

I want to split string based on delimiter ; only if ; is not inside double quotes. So after the split, it will be

 a
 b
"aaa;;;bccc"
 deef

I tried using look-behind, but I'm not able to find a correct regular expression for splitting.

like image 689
Vivek Goel Avatar asked Jun 29 '13 05:06

Vivek Goel


1 Answers

Regular expressions are probably not the right tool for this. If possible you should use a CSV library, specify ; as the delimiter and " as the quote character, this should give you the exact fields you are looking for.

That being said here is one approach that works by ensuring that there are an even number of quotation marks between the ; we are considering the split at and the end of the string.

;(?=(([^"]*"){2})*[^"]*$)

Example: http://www.rubular.com/r/RyLQyR8F19

This will break down if you can have escaped quotation marks within a string, for example a;"foo\"bar";c.

Here is a much cleaner example using Python's csv module:

import csv, StringIO
reader = csv.reader(StringIO.StringIO('a;b;"aaa;;;bccc";deef'),
                    delimiter=';', quotechar='"')
for row in reader:
    print '\n'.join(row)
like image 143
Andrew Clark Avatar answered Sep 28 '22 17:09

Andrew Clark