Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl regex alternation

Tags:

regex

perl

I have the following strings:

Data 250 MB
Data 1.5 GB
Data 10 GB

I need to capture only the values 250 MB, 1.5 GB, 10 GB. So I wrote the expression

(my $data) = $str1 =~ /Data (\S+ GB|MB)/ or die "$str1\n";

This works for the data in GB, but for MB, I get the result Data 250 MB. Can anyone please explain why?

like image 795
user828647 Avatar asked Nov 13 '15 15:11

user828647


People also ask

What is alternation in regex?

The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar.

What is the meaning of $1 in Perl regex?

$1 equals the text " brown ".

What does =~ in Perl?

9.3. The Binding Operator, =~ Matching against $_ is merely the default; the binding operator (=~) tells Perl to match the pattern on the right against the string on the left, instead of matching against $_.

What is \W in Perl regex?

Under /a , \d always means precisely the digits "0" to "9" ; \s means the five characters [ \f\n\r\t] , and starting in Perl v5. 18, the vertical tab; \w means the 63 characters [A-Za-z0-9_] ; and likewise, all the Posix classes such as [[:print:]] match only the appropriate ASCII-range characters.


2 Answers

The alternation operator doesn't operate on words, it means

(\S+ GB) or (MB)

Add non-capturing parentheses:

/Data (\S+ (?:GB|MB))/

which you can further simplify with a character class:

/Data (\S+ [GM]B)/
like image 152
choroba Avatar answered Sep 29 '22 19:09

choroba


You defined your capture group as (\S+ GB|MB) which matches \S+ GB or MB (i.e. | is applied to whole capture group).

You want either:

  1. /Data (\S+ GB|\S+ MB)/
  2. /Data (\S+ (GB|MB))/, or even better /Data (\S+ (?:GB|MB))/, utilizing non-capture group.
like image 31
el.pescado - нет войне Avatar answered Sep 29 '22 18:09

el.pescado - нет войне