Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separator Retention Mode in perl split()

Tags:

split

perl

split /PATTERN/,EXPR

I read the following in a book,

When you use a pattern in split, be sure to avoid memory parantheses in the pattern since these trigger seperator retention mode.

I can't seem to find the documentation which explains this in detail. Could someone please explain Seperator Retention Mode and its possible usage briefly?

like image 855
Anirudh Ramanathan Avatar asked Sep 19 '12 18:09

Anirudh Ramanathan


People also ask

How do I split a string with multiple delimiters in Perl?

A string is splitted based on delimiter specified by pattern. By default, it whitespace is assumed as delimiter. split syntax is: Split /pattern/, variableName.

What does split function do in Perl?

split() is a string function in Perl which is used to split or you can say to cut a string into smaller sections or pieces. There are different criteria to split a string, like on a single character, a regular expression(pattern), a group of characters or on undefined value etc..

How do you split a character in Perl?

Using the split function: my $x='welcome'; my @arr=split (//, $x); print "@arr"; The split function is used to split the string $x by giving the delimiter as nothing. Since the delimiter is empty, split splits at every character, and hence all the individual characters are retrieved in the array.

How do I split a string by a dot in Perl?

I use the following perl code: ($First,$Last) = split "." , $string; I have also used: ($First,$Last) = split '.


2 Answers

This is documented in perldoc -f split towards the end (in-code commentary is my own):

If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards the LIMIT. Consider the following expressions evaluated in list context (each returned list is provided in the associated comment):

split(/-|,/, "1-10,20", 3)       # ('1', '10', '20')
                                 # No retention, '-', ',' consumed

split(/(-|,)/, "1-10,20", 3)     # ('1', '-', '10', ',', '20')
                                 # Split on and retain '-' or ','
                                 # 5 elements returned

split(/-|(,)/, "1-10,20", 3)     # ('1', undef, '10', ',', '20')
                                 # undef because '-' matches

split(/(-)|,/, "1-10,20", 3)     # ('1', '-', '10', undef, '20')
                                 # undef because ',' matches

split(/(-)|(,)/, "1-10,20", 3)   # ('1', '-', undef, '10', undef, ',', '20')
                                 # one match per capturing group. (-) matches -, but
                                 # (,) returns undef on trying to match -.
                                 # 7 elements (!)

So, two interesting quirks that may catch out the unwary:

  • The generation of undefs in list context whenever a capturing group does not match, but something else in PATTERN does

  • You might split with a capture group, specifying LIMIT as $n, and the resultant list has more than $n elements

like image 123
Zaid Avatar answered Sep 26 '22 20:09

Zaid


It means that if you use a regex with parentheses that generates back references, then the matched separators will be retained, and returned in the list, along with the split values.

like image 45
Len Jaffe Avatar answered Sep 26 '22 20:09

Len Jaffe