Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I iterate over regular expression match variables in Perl?

I have a long regular expression that parses a text file into various match variables.

For robustness, the match variables are likely to contain white space. I'd like to remove the whitespace in a systematic way by iterating over the match variables.

For example, I have match variables $2 through $14 that contain some whitespace.

I could do:

my @columns = my ($serNum, $helixID, $initResName, $initChainID,
$initSeqNum, $initIcode, $endResName, $endChainID, $endSeqNum,
$endICode, $helixClass, $comment, $length) = 
($2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14);

### Remove whitespace                       
foreach my $element (0..$#columns) {
    $columns[$element] =~ s/^\s+//;
    $columns[$element] =~ s/\s+$//;
}

But this only removes the white space in the elements in @column, and leaves the properly named scalars, $serNum, $helixID, etc., untouched.

Is there a way to remove the white space in each of the match variables before I copy them to more well-named scalars, or is there a way to iterate over these well-named scalars themselves and remove the whitespace from there?

I presume there might be some way to do this with references.

like image 767
EMiller Avatar asked Jun 29 '10 19:06

EMiller


2 Answers

You can store the match variables in array first, then strip whitespace using map:

my @matches = ($2, $3, $4, ...);

my ($serNum, $helixID, ...) 
  = map { (my $v = $_) =~ s/^\s+|\s+$//g; $v } @matches;
like image 161
Eugene Yarmash Avatar answered Oct 05 '22 18:10

Eugene Yarmash


It's refreshing to see a good level of detail in questions! It enables the community to address the problem in a much better fashion.

What I would do is migrate away from the 'well-named' array of elements to a hash. This is cleaner and has the potential to reduce the number of variables needed in code.

my @matches = $data =~ m{$regex};   # Populates @matches with ( $1, $2, $3, ..)
my @labels  = qw/serNum helixID initResName .../;   # Create labels

my %record;                                 # Initialize hash
@record{@labels} = grep { s!^\s*|\s*$!!g }  # Strips out leading/trailing spaces
                   @matches[1..$#matches];  # Populate %record with array slice
                                            # Array slice of @matches needed to 
                                            # ignore the $1

# Now data can be accessed as follows:
print $record{helixID};                     # Prints the helix ID in the record

The grep part may need some explaining. It's a fancy way of avoiding having to lexically copy each string inside a map call.

By its nature, grep filters arrays. This is why the whitespace-stripping regex had to be modified from \s+ to \s*, ensuring that the regex is always matched, and so no items are filtered out.

like image 41
Zaid Avatar answered Oct 05 '22 18:10

Zaid