Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to find and replace unescaped Non-successive double quotes in CSV file

This is an extension to a related question answered Here

I have a weekly csv file which needs to be parsed. it looks like this.

"asdf","asdf","asdf","asdf"

But sometimes there are text fields which contain an extra unescaped double quote string like this

"asdf","as "something" df","asdf","asdf"

From the other posts on here, I was able to put together a regex

(?m)""(?![ \t]*(,|$))

which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between"

now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?

thanks.

like image 439
stevenjmyu Avatar asked Oct 18 '25 21:10

stevenjmyu


1 Answers

(?<!^|,)"(?!,|$)

will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.

If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use

(?<!^\s*|,\s*)"(?!\s*,|\s*$)
like image 150
Tim Pietzcker Avatar answered Oct 20 '25 12:10

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!