I want to split a string using repeating letters as delimiter, for example, <code>"123aaaa23a3"</code> should be split as <code>('123', '23a3')</code> while <code>"123abc4"</code> should be left unchanged. So I tried this: <pre class="prettyprint"><code>@s = split /([[:alpha:]])\1+/, '123aaaa23a3'; </code></pre> But this returns <code>'123', 'a', '23a3'</code>, which is not what I wanted. Now I know that this is because the last <code>'a'</code> in <code>'aaaa'</code> is captured by the parantheses and thus preserved by <code>split()</code>. But anyway, I can't add something like <code>?:</code> since <code>[[:alpha:]]</code> must be captured for back reference. How can I resolve this situation?

Hmm, its an interesting one. My first thought would be - your delimiter will always be odd numbers, so you can just discard any odd numbered array elements. Something like this perhaps?: <pre class="prettyprint"><code>my %s = (split (/([[:alpha:]])\1+/, '123aaaa23a3'), '' ); print Dumper \%s; </code></pre> This'll give you: <pre class="prettyprint"><code>$VAR1 = { '23a3' => '', '123' => 'a' }; </code></pre> So you can extract your pattern via <code>keys</code>. Unfortunately my second approach of 'selecting out' the pattern matches via <code>%+</code> doesn't help particularly (split doesn't populate the regex stuff). But something like this: <pre class="prettyprint"><code>my @delims ='123aaaa23a3' =~ m/(?<delim>[[:alpha:]])\g{delim}+/g; print Dumper \%+; </code></pre> By using a named capture, we identify that <code>a</code> is from the capture group. Unfortunately, this doesn't seem to be populated when you do this via <code>split</code> - which might lead to a two-pass approach. This is the closest I got: <pre class="prettyprint"><code>#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my $str = '123aaaa23a3'; #build a regex out of '2-or-more' characters. my $regex = join ( "|", map { $_."{2,}"} $str =~ m/([[:alpha:]])\1+/g); #make the regex non-capturing $regex = qr/(?:$regex)/; print "Using: $regex\n"; #split on the regex my @s = split m/$regex/, $str; print Dumper \@s; </code></pre> We first process the string to extract "2-or-more" character patterns, to set as our delmiters. Then we assemble a regex out of them, using non-capturing, so we can split.

One solution would be to use your original <code>split</code> call and throw away every other value. Conveniently, <code>List::Util::pairkeys</code> is a function that keeps the first of every pair of values in its input list: <pre class="prettyprint"><code>use List::Util 1.29 qw( pairkeys ); my @vals = pairkeys split /([[:alpha:]])\1+/, '123aaaa23a3'; </code></pre> Gives <pre class="prettyprint"><code>Odd number of elements in pairkeys at (eval 6) line 1. [ '123', '23a3' ] </code></pre> That warning comes from the fact that <code>pairkeys</code> wants an even-sized list. We can solve that by adding one more value at the end: <pre class="prettyprint"><code>my @vals = pairkeys split( /([[:alpha:]])\1+/, '123aaaa23a3' ), undef; </code></pre> Alternatively, and maybe a little neater, is to add that extra value at the start of the list and use <code>pairvalues</code> instead: <pre class="prettyprint"><code>use List::Util 1.29 qw( pairvalues ); my @vals = pairvalues undef, split /([[:alpha:]])\1+/, '123aaaa23a3'; </code></pre>

Perl split function - use repeating characters as delimiter

I want to split a string using repeating letters as delimiter, for example, "123aaaa23a3" should be split as ('123', '23a3') while "123abc4" should be left unchanged.
So I tried this:

@s = split /([[:alpha:]])\1+/, '123aaaa23a3';

But this returns '123', 'a', '23a3', which is not what I wanted. Now I know that this is because the last 'a' in 'aaaa' is captured by the parantheses and thus preserved by split(). But anyway, I can't add something like ?: since [[:alpha:]] must be captured for back reference. How can I resolve this situation?

What are the examples of split function in Perl?

Below is the example of split function in perl are as follows. 1. Splitting on Character Please find below example to split string using character. In the below example we have splitting string on character basis. We have splitting using comma. We have splitted number of character string into multiple sting.

How do you split a string with multiple delimiters in Perl?

Perl split on Multiple Characters We can split a character at more than one delimiter. In the following example, we have split the string at (=) and (,). my $str = 'Vishal=18Sept,Anu=11May,Juhi=5Jul';

What is the difference between Split and join character in Perl?

There is an empty string, between every two characters. It means it will return the original string split into individual characters. Perl join character, joins elements into a single string using a delimiter pattern to separate each element. It is opposite of split.

What is a field delimiter in Perl?

When writing Perl programs, many "data" files you end up working with are really plain text files that use some kind of character to act as a field delimiter. For instance, a program I was working with recently reads data from files whose fields are separated by the pipe character ("|"). In Perl programs these files are easy to work with.

Hmm, its an interesting one. My first thought would be - your delimiter will always be odd numbers, so you can just discard any odd numbered array elements.

Something like this perhaps?:

my %s = (split (/([[:alpha:]])\1+/, '123aaaa23a3'), '' );
print Dumper \%s;

This'll give you:

$VAR1 = {
          '23a3' => '',
          '123' => 'a'
        };

So you can extract your pattern via keys.

Unfortunately my second approach of 'selecting out' the pattern matches via %+ doesn't help particularly (split doesn't populate the regex stuff).

But something like this:

my @delims ='123aaaa23a3' =~ m/(?<delim>[[:alpha:]])\g{delim}+/g; 
print Dumper \%+;

By using a named capture, we identify that a is from the capture group. Unfortunately, this doesn't seem to be populated when you do this via split - which might lead to a two-pass approach.

This is the closest I got:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my $str = '123aaaa23a3';

#build a regex out of '2-or-more' characters. 
my $regex = join ( "|", map { $_."{2,}"} $str =~ m/([[:alpha:]])\1+/g);
#make the regex non-capturing
$regex = qr/(?:$regex)/;
print "Using: $regex\n";

#split on the regex
my @s  = split m/$regex/, $str;

print Dumper \@s;

We first process the string to extract "2-or-more" character patterns, to set as our delmiters. Then we assemble a regex out of them, using non-capturing, so we can split.

One solution would be to use your original split call and throw away every other value. Conveniently, List::Util::pairkeys is a function that keeps the first of every pair of values in its input list:

use List::Util 1.29 qw( pairkeys );

my @vals = pairkeys split /([[:alpha:]])\1+/, '123aaaa23a3';

Gives

Odd number of elements in pairkeys at (eval 6) line 1.
[ '123', '23a3' ]

That warning comes from the fact that pairkeys wants an even-sized list. We can solve that by adding one more value at the end:

my @vals = pairkeys split( /([[:alpha:]])\1+/, '123aaaa23a3' ), undef;

Alternatively, and maybe a little neater, is to add that extra value at the start of the list and use pairvalues instead:

use List::Util 1.29 qw( pairvalues );

my @vals = pairvalues undef, split /([[:alpha:]])\1+/, '123aaaa23a3';

Perl split function - use repeating characters as delimiter

Tags:

regex

perl

AaronS

People also ask

2 Answers

Sobrique

LeoNerd

Recent Activity

Donate For Us

Perl split function - use repeating characters as delimiter

Tags:

regex

perl

AaronS

People also ask

2 Answers

Sobrique

LeoNerd

Related questions

Recent Activity

Donate For Us