I have a long regular expression that parses a text file into various match variables.
For robustness, the match variables are likely to contain white space. I'd like to remove the whitespace in a systematic way by iterating over the match variables.
For example, I have match variables $2
through $14
that contain some whitespace.
I could do:
my @columns = my ($serNum, $helixID, $initResName, $initChainID,
$initSeqNum, $initIcode, $endResName, $endChainID, $endSeqNum,
$endICode, $helixClass, $comment, $length) =
($2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14);
### Remove whitespace
foreach my $element (0..$#columns) {
$columns[$element] =~ s/^\s+//;
$columns[$element] =~ s/\s+$//;
}
But this only removes the white space in the elements in @column
, and leaves the properly named scalars, $serNum
, $helixID
, etc., untouched.
Is there a way to remove the white space in each of the match variables before I copy them to more well-named scalars, or is there a way to iterate over these well-named scalars themselves and remove the whitespace from there?
I presume there might be some way to do this with references.
You can store the match variables in array first, then strip whitespace using map:
my @matches = ($2, $3, $4, ...);
my ($serNum, $helixID, ...)
= map { (my $v = $_) =~ s/^\s+|\s+$//g; $v } @matches;
It's refreshing to see a good level of detail in questions! It enables the community to address the problem in a much better fashion.
What I would do is migrate away from the 'well-named' array of elements to a hash. This is cleaner and has the potential to reduce the number of variables needed in code.
my @matches = $data =~ m{$regex}; # Populates @matches with ( $1, $2, $3, ..)
my @labels = qw/serNum helixID initResName .../; # Create labels
my %record; # Initialize hash
@record{@labels} = grep { s!^\s*|\s*$!!g } # Strips out leading/trailing spaces
@matches[1..$#matches]; # Populate %record with array slice
# Array slice of @matches needed to
# ignore the $1
# Now data can be accessed as follows:
print $record{helixID}; # Prints the helix ID in the record
The grep
part may need some explaining. It's a fancy way of avoiding having to lexically copy each string inside a map
call.
By its nature, grep
filters arrays. This is why the whitespace-stripping regex had to be modified from \s+
to \s*
, ensuring that the regex is always matched, and so no items are filtered out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With