Consider the following character strings:
"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla
I'm trying to match any ;
that is not in a quoted field (e.g. "bla ; bla"
) or in between 2 quotes.
In other words, I would like to match the second ;
in the first 2 strings and all ;
in the last string.
Here are the 2 regex I've been trying but I can't manage to make one that works in all cases.
^(['"])(?:(?!\1).)*\1(?=;)(*SKIP)(*F)|;
^(['"])(?:(?!(?!\1)\1).)*\1(?=;)(*SKIP)(*F)|;
Any idea?
EDIT
I omitted several important details in my initial question. The example lines above are from .csv
files. I'm trying to extract all file separators ;
in lines from different files. The problem I have is to distinguish between a quoted ;
inside a quoted field (line 2) and two quoted fields separated by ;
(line 3). A quoted field is always followed by ;
in my case.
Use an actual CSV parser (Well, Semicolon-SV) like Text::CSV_XS
instead of trying to hack up something with regular expressions:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ";"});
while (my $row = $csv->getline(\*DATA)) {
say $row->[0];
}
__DATA__
"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With