I have the following strings:
Data 250 MB
Data 1.5 GB
Data 10 GB
I need to capture only the values 250 MB, 1.5 GB, 10 GB
. So I wrote the expression
(my $data) = $str1 =~ /Data (\S+ GB|MB)/ or die "$str1\n";
This works for the data in GB, but for MB, I get the result Data 250 MB
. Can anyone please explain why?
The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar.
$1 equals the text " brown ".
9.3. The Binding Operator, =~ Matching against $_ is merely the default; the binding operator (=~) tells Perl to match the pattern on the right against the string on the left, instead of matching against $_.
Under /a , \d always means precisely the digits "0" to "9" ; \s means the five characters [ \f\n\r\t] , and starting in Perl v5. 18, the vertical tab; \w means the 63 characters [A-Za-z0-9_] ; and likewise, all the Posix classes such as [[:print:]] match only the appropriate ASCII-range characters.
The alternation operator doesn't operate on words, it means
(\S+ GB) or (MB)
Add non-capturing parentheses:
/Data (\S+ (?:GB|MB))/
which you can further simplify with a character class:
/Data (\S+ [GM]B)/
You defined your capture group as (\S+ GB|MB)
which matches \S+ GB
or MB
(i.e. |
is applied to whole capture group).
You want either:
/Data (\S+ GB|\S+ MB)/
/Data (\S+ (GB|MB))/
, or even better /Data (\S+ (?:GB|MB))/
, utilizing non-capture group.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With