I am attempting to match three letters from a file name with the 1000Genomes project, and three letters only, from strings like ethnicity_lists/PEL.txt
I should only get PEL
. The rest of the string is irrelevant.
my $p1-label = @populations[$p1-index].match(/^ethnicity_lists\/(<[A..Y]>)**3\.txt$/);
The problem is that $p1-label
includes the entire string beyond the capture group.
I have put the parentheses around <[A..Y]>
to emphasize that I only want that group.
Looking through https://docs.perl6.org/routine/match
I try to be as specific as possible to prevent any possible errors, which is why I include the entire string.
If I do the Perl5-style match:
if @populations[$p1-index] ~~ /^ethnicity_lists\/(<[A..Y]>)**3\.txt$/ {
put $0.join(''); # strange that this outputs an array instead of a string
}
I've tried all of the adverbs for the match
method but none do the necessary job.
How can I restrict a match
method to only the capture group in the regex?
The match method returns a Match object that comprises all the information about your match. If you do :
my $p1-label = @populations[$p1-index].match(/^ethnicity_lists\/(<[A..Y]>)**3\.txt$/);
say $p1-label;
You'll see it includes 3 items flagged as 0
because of the mentioned **3 outside the brackets :
「ethnicity_lists/PEL.txt」
0 => 「P」
0 => 「E」
0 => 「L」
Getting the Str representation of the Match object gives you the complete match. But you can also ask for it's [0]
index.
say say $p1-label[0]'
[「P」 「E」 「L」]
Lets fix the regular expression to put the quantifier in the brackets and see what we get.
my $p1-label = @populations[$p1-index].match(/^ethnicity_lists\/(<[A..Y]>**3)\.txt$/);
say $p1-label;
「ethnicity_lists/PEL.txt」
0 => 「PEL」
Looking better. Now if you only want the PEL
bit you've got two options. You can just get the Str representation of the first item in the match :
my $p1-label = @populations[$p1-index].match(/^ethnicity_lists\/(<[A..Y]>**3)\.txt$/)[0].Str;
say $p1-label;
PEL
Note if I don't coerce it to a String I get the match object of the sub match. (Which can be useful but not what you need).
Or you can use Zero Width assertions and skip the capturing altogether :
my $p1-label = @populations[$p1-index].match(/<?after ^ethnicity_lists\/><[A..Y]>**3<?before \.txt$>/).Str;
say $p1-label;
PEL
Here we are matching 3 upper case letters that occur after the expression ^ethnicity_lists\/
and before \.txt$
but they aren't included in the match itself.
Or as pointed out by @raiph you can use a double capture to tell the system this is the only bit you want :
my $p1-label = @populations[$p1-index].match(/^ethnicity_lists\/<(<[A..Y]>**3)>\.txt$/)[0].Str;
say $p1-label;
PEL
This last one is probably best.
Hope that helps.
@Holli's answer makes a key point and @Scimon's digs in deeper about why you got the result you got but...
If you doubly emphasize what part you want with <( ... )>
instead of just ( ... )
it makes just that part become the overall capture object.
And if you use put
instead of say
you get the machine friendly stringification (same as .Str
, so in this case PEL
) instead of the human friendly stringification (same as .gist
, so in this case it would have been 「PEL」
):
put 'fooPELbar' ~~ / foo ( ... ) bar /; # fooPELbar
put 'fooPELbar' ~~ / foo <( ... )> bar /; # PEL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With