Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for finding an unterminated string

I need to search for lines in a CSV file that end in an unterminated, double-quoted string.

For example:

1,2,a,b,"dog","rabbit

would match whereas

1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit

would not.

I have very limited experience with regular expressions, and the only thing I could think of is something like

"[^"]*$

However, that matches the last quote to the end of the line.

How would this be done?

like image 427
Austin Hyde Avatar asked May 25 '10 15:05

Austin Hyde


3 Answers

Assuming quotes can't be escaped, you need to test the parity of quotes (making sure that there's an even number of them instead of odd). Regular expressions are great for that:

^(([^"]*"){2})*[^"]*$

That will match all lines with an even number of quotes. You can invert the result for all strings with an odd number. Or you can just add another ([^"]*") part at the beginning:

^[^"]*"(([^"]*"){2})*[^"]*$

Similarly, if you have access to reluctant operators instead of greedy ones you can use a simpler-looking expression:

^((.*"){2})*.*$         #even
^.*"((.*"){2})*.*$      #odd

Now, if quotes can be escaped, it's a different question entirely, but the approach would be similar: determine the parity of unescaped quotes.

like image 135
Welbog Avatar answered Sep 21 '22 13:09

Welbog


Assuming that the strings cannot contain ", you need to match a string that has an odd number of quotes, like this:

([^"]*("[^"]*")?)*"

Note that this is vulnerable to a DDOS attack.

This will match zero or more sets of unquoted run, followed by quoted strings.

like image 37
SLaks Avatar answered Sep 20 '22 13:09

SLaks


Try this one:

".+[^"](,|$)

This matches a quote (anywhere in the line), followed (greedily) by anything but another quote before the end of the line or a comma.

The net affect is that it will only match lines with dangling quoted strings.

I think it's even immune to 'nested expandos attacks' (we do live in a very dangerous world ...)

like image 34
Adrian Regan Avatar answered Sep 20 '22 13:09

Adrian Regan