Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to match a quoted string embedded in another quoted string

Tags:

c#

regex

I have a data source that is comma-delimited, and quote-qualified. A CSV. However, the data source provider sometimes does some wonky things. I've compensated for all but one of them (we read in the file line-by-line, then write it back out after cleansing), and I'm looking to solve the last remaining problem when my regex-fu is pretty weak.

Matching a Quoted String inside of another Quoted String

So here is our example string...

"foobar", 356, "Lieu-dit "chez Métral", Chilly, FR", "-1,000.09", 467, "barfoo", 1,345,456,235,231, "935.18"

I am looking to match the substring "chez Métral", in order to replace it with the substring chez Métral. Ideally, in as few lines of code as possible. The final goal is to write the line back out (or return it as a method return value) with the replacement already done.

So our example string would end up as...

"foobar", 356, "Lieu-dit chez Métral, Chilly, FR", "-1,000.09", 467, "barfoo", 1,345,456,235,231, "935.18"

I know I could define a pattern such as (?<quotedstring>\"\w+[^,]+\") to match quoted strings, but my regex-fu is weak (database developer, almost never use C#), so I'm not sure how to match another quoted string within the named group quotedstring.


FYI: For those noticing the large integer that is formatted with commas but not quote-qualified, that's already handled. As is the random use of row-delimiters (sometimes CR, sometimes LF). As other problems...

like image 544
The Lazy DBA Avatar asked Nov 27 '12 16:11

The Lazy DBA


1 Answers

Replace with this regex

(?<!,\s*|^)"([^",]*)"

now replace it with $1

try it here


escaping " with "" it would become

(?<!,\s*|^)""([^"",]*)""
like image 160
Anirudha Avatar answered Nov 15 '22 11:11

Anirudha