I want to split a string using repeating letters as delimiter, for example,
"123aaaa23a3"
should be split as ('123', '23a3')
while "123abc4"
should be left unchanged.
So I tried this:
@s = split /([[:alpha:]])\1+/, '123aaaa23a3';
But this returns '123', 'a', '23a3'
, which is not what I wanted. Now I know that this is because the last 'a'
in 'aaaa'
is captured by the parantheses and thus preserved by split()
. But anyway, I can't add something like ?:
since [[:alpha:]]
must be captured for back reference.
How can I resolve this situation?
Below is the example of split function in perl are as follows. 1. Splitting on Character Please find below example to split string using character. In the below example we have splitting string on character basis. We have splitting using comma. We have splitted number of character string into multiple sting.
Perl split on Multiple Characters We can split a character at more than one delimiter. In the following example, we have split the string at (=) and (,). my $str = 'Vishal=18Sept,Anu=11May,Juhi=5Jul';
There is an empty string, between every two characters. It means it will return the original string split into individual characters. Perl join character, joins elements into a single string using a delimiter pattern to separate each element. It is opposite of split.
When writing Perl programs, many "data" files you end up working with are really plain text files that use some kind of character to act as a field delimiter. For instance, a program I was working with recently reads data from files whose fields are separated by the pipe character ("|"). In Perl programs these files are easy to work with.
Hmm, its an interesting one. My first thought would be - your delimiter will always be odd numbers, so you can just discard any odd numbered array elements.
Something like this perhaps?:
my %s = (split (/([[:alpha:]])\1+/, '123aaaa23a3'), '' );
print Dumper \%s;
This'll give you:
$VAR1 = {
'23a3' => '',
'123' => 'a'
};
So you can extract your pattern via keys
.
Unfortunately my second approach of 'selecting out' the pattern matches via %+
doesn't help particularly (split doesn't populate the regex stuff).
But something like this:
my @delims ='123aaaa23a3' =~ m/(?<delim>[[:alpha:]])\g{delim}+/g;
print Dumper \%+;
By using a named capture, we identify that a
is from the capture group. Unfortunately, this doesn't seem to be populated when you do this via split
- which might lead to a two-pass approach.
This is the closest I got:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $str = '123aaaa23a3';
#build a regex out of '2-or-more' characters.
my $regex = join ( "|", map { $_."{2,}"} $str =~ m/([[:alpha:]])\1+/g);
#make the regex non-capturing
$regex = qr/(?:$regex)/;
print "Using: $regex\n";
#split on the regex
my @s = split m/$regex/, $str;
print Dumper \@s;
We first process the string to extract "2-or-more" character patterns, to set as our delmiters. Then we assemble a regex out of them, using non-capturing, so we can split.
One solution would be to use your original split
call and throw away every other value. Conveniently, List::Util::pairkeys
is a function that keeps the first of every pair of values in its input list:
use List::Util 1.29 qw( pairkeys );
my @vals = pairkeys split /([[:alpha:]])\1+/, '123aaaa23a3';
Gives
Odd number of elements in pairkeys at (eval 6) line 1.
[ '123', '23a3' ]
That warning comes from the fact that pairkeys
wants an even-sized list. We can solve that by adding one more value at the end:
my @vals = pairkeys split( /([[:alpha:]])\1+/, '123aaaa23a3' ), undef;
Alternatively, and maybe a little neater, is to add that extra value at the start of the list and use pairvalues
instead:
use List::Util 1.29 qw( pairvalues );
my @vals = pairvalues undef, split /([[:alpha:]])\1+/, '123aaaa23a3';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With