Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I store captures from a Perl regular expression into separate variables?

Tags:

I have a regex:

/abc(def)ghi(jkl)mno(pqr)/igs

How would I capture the results of each parentheses into 3 different variables, one for each parentheses? Right now I using one array to capture all the results, they come out sequential but then I have to parse them and the list could be huge.

@results = ($string =~ /abc(def)ghi(jkl)mno(pqr)/igs);
like image 527
Incognito Avatar asked Feb 14 '10 01:02

Incognito


People also ask

How do I capture a regular expression in Perl?

Instead in Perl, the captured string is stored inside a series of magical variables. The first matching capture is stored into $1, the second one in $2, and so on. Capturing count starts at the opening parenthesis of the capture. Thus making the first left parenthesis to capture into $1, the second one in $2 and so on.

What is \d in Perl regex?

The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.

What is =~ in Perl?

The '=~' operator is a binary binding operator that indicates the following operation will search or modify the scalar on the left. The default (unspecified) operator is 'm' for match. The matching operator has a pair of characters that designate where the regular expression begins and ends.

How do you capture a regular expression?

To capture all matches to a regex group we need to use the finditer() method. The finditer() method finds all matches and returns an iterator yielding match objects matching the regex pattern. Next, we can iterate each Match object and extract its value.


2 Answers

Your question is a bit ambiguous to me, but I think you want to do something like this:

my (@first, @second, @third);
while( my ($first, $second, $third) = $string =~ /abc(def)ghi(jkl)mno(pqr)/igs) {
    push @first, $first;
    push @second, $second;
    push @third, $third;
}
like image 96
Leon Timmermans Avatar answered Nov 08 '22 20:11

Leon Timmermans


Starting with 5.10, you can use named capture buffers as well:

#!/usr/bin/perl

use strict; use warnings;

my %data;

my $s = 'abcdefghijklmnopqr';

if ($s =~ /abc (?<first>def) ghi (?<second>jkl) mno (?<third>pqr)/x ) {
    push @{ $data{$_} }, $+{$_} for keys %+;
}

use Data::Dumper;
print Dumper \%data;

Output:

$VAR1 = {
          'first' => [
                       'def'
                     ],
          'second' => [
                        'jkl'
                      ],
          'third' => [
                       'pqr'
                     ]
        };

For earlier versions, you can use the following which avoids having to add a line for each captured buffer:

#!/usr/bin/perl

use strict; use warnings;

my $s = 'abcdefghijklmnopqr';

my @arrays = \ my(@first, @second, @third);

if (my @captured = $s =~ /abc (def) ghi (jkl) mno (pqr) /x ) {
    push @{ $arrays[$_] }, $captured[$_] for 0 .. $#arrays;
}

use Data::Dumper;
print Dumper @arrays;

Output:

$VAR1 = [
          'def'
        ];
$VAR2 = [
          'jkl'
        ];
$VAR3 = [
          'pqr'
        ];

But I like keeping related data in a single data structure, so it is best to go back to using a hash. This does require an auxiliary array, however:

my %data;
my @keys = qw( first second third );

if (my @captured = $s =~ /abc (def) ghi (jkl) mno (pqr) /x ) {
    push @{ $data{$keys[$_]} }, $captured[$_] for 0 .. $#keys;
}

Or, if the names of the variables really are first, second etc, or if the names of the buffers don't matter but only order does, you can use:

my @data;
if ( my @captured = $s =~ /abc (def) ghi (jkl) mno (pqr) /x ) {
    push @{ $data[$_] }, $captured[$_] for 0 .. $#captured;
}
like image 25
Sinan Ünür Avatar answered Nov 08 '22 19:11

Sinan Ünür