I have the following string:
'100% California Grown Olives, Water, Salt And Ferrous Gluconate (An,Iron, Derivative),asasd, sadasda'
I'm trying to split it by /,/
but only if its not inside brackets, for instance, in this case the result should be:
100% California Grown Olives
Water
Salt And Ferrous Gluconate (An,Iron, Derivative)
asasd
sadasda
thanks,
@result = split(m/,(?![^()]*\))/, $subject);
This splits on a comma only if the next following parenthesis (if any) isn't a closing parenthesis. As Jack Maney noted correctly, this can lead to failure if nested parentheses may occur.
Explanation:
, # Match a comma.
(?! # Assert that it's impossible to match...
[^()]* # any number of non-parenthesis characters
\) # followed by a closing parenthesis
) # End of lookahead assertion
First you need to decide what constitutes parens, and if they can be nested. (for this answer, I will assume that they can be). Then you need to remove those paren blocks from the text and replace it with a placeholder:
my @parens;
$str =~ s/( \( (?: (?0)|[^()] )* \) )/push @parens, $1; "PARENS_$#parens"/gex;
So now you are left with something that looks like:
'100% California Grown Olives, Water, Salt And Ferrous Gluconate PAREN_0,asasd,
sadasdas.'
And it is simple now to split it on commas. Then on each of the split pieces, scan for PAREN_\d+
tokens, and replace them with the ones from the @parens
array. You might need to use a more unique placeholder name depending on your source content.
Something like:
s/PARENS_(\d+)/$parens[$1]/ge for my @segs = split /,\s*/ => $str;
say for @segs;
which for an example string:
my $str = "foo (b,a,r), baz (foo, (bar), baz), biz";
prints:
foo (b,a,r)
baz (foo, (bar), baz)
biz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With