Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is \d+ not matching all digits?

Tags:

regex

ruby

I have the following regular expression:

REGEX = /^.+(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[A-Z]{2}[, ]+\d{5}(?:-\d{4})?).+/

I have the following string:

str = "fdsfd 8126 E Bowen AVE Bensalem, PA 19020-1642 dfdf"

Notice my capturing group begins with one or more digits that match the pattern. Yet this is what I get:

str =~ REGEX
$1
 => "6 E Bowen AVE Bensalem, PA 19020-1642" 

Or

match = str.match(REGEX)
match[1]
=> "6 E Bowen AVE Bensalem, PA 19020-1642"

Why is it missing the first 3 digits of 812?

like image 597
Daniel Viglione Avatar asked Mar 14 '18 20:03

Daniel Viglione


People also ask

What does \d do in regex?

\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).

How do I match numbers in Perl?

The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.


1 Answers

The below regex works properly, as you can see at Regex101

REGEX = /^.+?(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[A-Z]{2}[, ]+\d{5}(?:-\d{4})?).+/

Note the addition of the question mark near the beginning of the regex

/^.+?(\d+...
    ^ 

By default, your first .+ is being greedy, consuming all digits it can, and still allowing the regex pass. By adding ? after the plus, you can make it lazy instead of greedy.

An alternative would be to not capture digits, like this:

/^[^\d]+(\d+...

[^\d]+ will capture everything except for digits.

like image 200
Adam Avatar answered Nov 15 '22 11:11

Adam