Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl split and regular expression

Tags:

regex

split

perl

I have the following string:

'100% California Grown Olives, Water, Salt And Ferrous Gluconate (An,Iron, Derivative),asasd, sadasda'

I'm trying to split it by /,/ but only if its not inside brackets, for instance, in this case the result should be:

100% California Grown Olives
Water
Salt And Ferrous Gluconate (An,Iron, Derivative)
asasd
sadasda

thanks,

like image 395
snoofkin Avatar asked Dec 12 '11 21:12

snoofkin


2 Answers

@result = split(m/,(?![^()]*\))/, $subject);

This splits on a comma only if the next following parenthesis (if any) isn't a closing parenthesis. As Jack Maney noted correctly, this can lead to failure if nested parentheses may occur.

Explanation:

,       # Match a comma.
(?!     # Assert that it's impossible to match...
 [^()]* # any number of non-parenthesis characters
 \)     # followed by a closing parenthesis
)       # End of lookahead assertion
like image 195
Tim Pietzcker Avatar answered Oct 26 '22 05:10

Tim Pietzcker


First you need to decide what constitutes parens, and if they can be nested. (for this answer, I will assume that they can be). Then you need to remove those paren blocks from the text and replace it with a placeholder:

my @parens;
$str =~ s/( \( (?: (?0)|[^()] )* \) )/push @parens, $1; "PARENS_$#parens"/gex;

So now you are left with something that looks like:

'100% California Grown Olives, Water, Salt And Ferrous Gluconate PAREN_0,asasd,
sadasdas.'

And it is simple now to split it on commas. Then on each of the split pieces, scan for PAREN_\d+ tokens, and replace them with the ones from the @parens array. You might need to use a more unique placeholder name depending on your source content.

Something like:

s/PARENS_(\d+)/$parens[$1]/ge for my @segs = split /,\s*/ => $str;

say for @segs;

which for an example string:

my $str = "foo (b,a,r), baz (foo, (bar), baz), biz";

prints:

foo (b,a,r)
baz (foo, (bar), baz)
biz
like image 32
Eric Strom Avatar answered Oct 26 '22 05:10

Eric Strom