split /PATTERN/,EXPR
I read the following in a book,
When you use a pattern in split, be sure to avoid memory parantheses in the pattern since these trigger seperator retention mode.
I can't seem to find the documentation which explains this in detail. Could someone please explain Seperator Retention Mode and its possible usage briefly?
A string is splitted based on delimiter specified by pattern. By default, it whitespace is assumed as delimiter. split syntax is: Split /pattern/, variableName.
split() is a string function in Perl which is used to split or you can say to cut a string into smaller sections or pieces. There are different criteria to split a string, like on a single character, a regular expression(pattern), a group of characters or on undefined value etc..
Using the split function: my $x='welcome'; my @arr=split (//, $x); print "@arr"; The split function is used to split the string $x by giving the delimiter as nothing. Since the delimiter is empty, split splits at every character, and hence all the individual characters are retrieved in the array.
I use the following perl code: ($First,$Last) = split "." , $string; I have also used: ($First,$Last) = split '.
This is documented in perldoc -f split
towards the end (in-code commentary is my own):
If the
PATTERN
contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures theundef
value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards theLIMIT
. Consider the following expressions evaluated in list context (each returned list is provided in the associated comment):split(/-|,/, "1-10,20", 3) # ('1', '10', '20') # No retention, '-', ',' consumed split(/(-|,)/, "1-10,20", 3) # ('1', '-', '10', ',', '20') # Split on and retain '-' or ',' # 5 elements returned split(/-|(,)/, "1-10,20", 3) # ('1', undef, '10', ',', '20') # undef because '-' matches split(/(-)|,/, "1-10,20", 3) # ('1', '-', '10', undef, '20') # undef because ',' matches split(/(-)|(,)/, "1-10,20", 3) # ('1', '-', undef, '10', undef, ',', '20') # one match per capturing group. (-) matches -, but # (,) returns undef on trying to match -. # 7 elements (!)
So, two interesting quirks that may catch out the unwary:
The generation of undef
s in list context whenever a capturing group does not match, but something else in PATTERN
does
You might split with a capture group, specifying LIMIT
as $n
, and the resultant list has more than $n
elements
It means that if you use a regex with parentheses that generates back references, then the matched separators will be retained, and returned in the list, along with the split values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With