Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match a character only when not in a quoted field or in between 2 quotes

Tags:

regex

perl

Consider the following character strings:

"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla

I'm trying to match any ; that is not in a quoted field (e.g. "bla ; bla") or in between 2 quotes.

In other words, I would like to match the second ; in the first 2 strings and all ; in the last string.

Here are the 2 regex I've been trying but I can't manage to make one that works in all cases.

^(['"])(?:(?!\1).)*\1(?=;)(*SKIP)(*F)|;
^(['"])(?:(?!(?!\1)\1).)*\1(?=;)(*SKIP)(*F)|;

Any idea?

EDIT

I omitted several important details in my initial question. The example lines above are from .csv files. I'm trying to extract all file separators ; in lines from different files. The problem I have is to distinguish between a quoted ; inside a quoted field (line 2) and two quoted fields separated by ; (line 3). A quoted field is always followed by ; in my case.

like image 241
Junitar Avatar asked Dec 31 '22 13:12

Junitar


1 Answers

Use an actual CSV parser (Well, Semicolon-SV) like Text::CSV_XS instead of trying to hack up something with regular expressions:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ";"});

while (my $row = $csv->getline(\*DATA)) {
    say $row->[0];
}


__DATA__
"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla
like image 141
Shawn Avatar answered Jan 13 '23 19:01

Shawn