Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the split function in perl supposed to work?

Tags:

regex

split

perl

I was having some difficulty with the split function today, and read through the perlfunc to see if I had interpreted something incorrectly. I was attempting to split a string on '.', which according to the perlfunc should be supported thusly:

my $string = "hello.world";
my ($hello, $world) = split(".", $string);

or

my $string = "hello.world";
my ($hello, $world) = split(/\./, $string);

However, testing the first resulted in empty variables, so I extended my testing to the following:

#!/usr/bin/perl

use strict;
use warnings;

my $time_of_test = "13.11.19.11.45.07";
print "TOD: $time_of_test\n";
my ($year, $month, $day, $hr, $min, $sec) = split(/\./, $time_of_test);
print "Test 1 -- Year: $year month: $month day: $day hour: $hr min: $min sec: $sec\n";
($year, $month, $day, $hr, $min, $sec) = split(".", $time_of_test);
print "Test 2 -- Year: $year month: $month day: $day hour: $hr min: $min sec: $sec\n";
($year, $month, $day, $hr, $min, $sec) = split('.', $time_of_test);
print "Test 3 -- Year: $year month: $month day: $day hour: $hr min: $min sec: $sec\n";
($year, $month, $day, $hr, $min, $sec) = split("\.", $time_of_test);
print "Test 4 -- Year: $year month: $month day: $day hour: $hr min: $min sec: $sec\n";
($year, $month, $day, $hr, $min, $sec) = split('\.', $time_of_test);
print "Test 5 -- Year: $year month: $month day: $day hour: $hr min: $min sec: $sec\n";

Here is the output:

> ./test.pl  
TOD: 13.11.19.11.45.07
Test 1 -- Year: 13 month: 11 day: 19 hour: 11 min: 45 sec: 07
Test 2 -- Year:  month:  day:  hour:  min:  sec: 
Test 3 -- Year:  month:  day:  hour:  min:  sec: 
Test 4 -- Year:  month:  day:  hour:  min:  sec: 
Test 5 -- Year: 13 month: 11 day: 19 hour: 11 min: 45 sec: 07

Is this working as intended? If so, how did I misinterpret the perlfunc documentation?

like image 482
Bts Avatar asked Dec 12 '22 09:12

Bts


2 Answers

The first argument to split is a regular expression. You should never use a string here (except in the special case of " "), because it's misleading as to the actual behavior.

The reason you got no results when splitting with "." and '.' is that it was interpreting these as regular expressions (split on everything).

With /\./ and '\.' you got the expected results, because the dot was correctly escaped in the regular expression.

You didn't get any results for "\." because it was treated as an escape sequence by the double-quoted string first, before being treated as a regular expression by split. By the time this made it to the split call, it was the same as ".".

like image 157
AKHolland Avatar answered Dec 22 '22 18:12

AKHolland


The string literal '\n' produces the string \n. The string literal '.' produces the string .. That string expected to be a regular expression. . in a regular expression matches any character except newline. The regular expression \. would match a period, and that string can be created from string literal '\.' or '\\.'. It's less misleading and simpler to escape most patterns if you use /\./, though.

like image 35
ikegami Avatar answered Dec 22 '22 18:12

ikegami