Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why do these two different regex's return different results in Ruby based upon position of underscore

Tags:

regex

ruby

I have the following:.

[11] pry(main)> "ab BN123-4.56".scan(/BN([0-9_\.-]+)/)
=> [["123-4.56"]]
[12] pry(main)> "ab BN123-4.56".scan(/BN([0-9\.-_]+)/)
=> [["123"]]

I am unsure why the second one with the the underscore at the end behaves differently than the first. How is it being interpreted by RegEx parser to make it different?

like image 377
timpone Avatar asked Apr 28 '14 12:04

timpone


2 Answers

It's because you have the hyphen (-) placed in the middle of the character class without being escaped.

Within a character class [], you can place a hyphen (-) as the first or last character. If you place the hyphen anywhere else you need to escape it (\-) in order to be matched.

"ab BN123-4.56".scan(/BN([0-9_\.-]+)/)   # => '123-4.56'
"ab BN123-4.56".scan(/BN([0-9\.\-_]+)/)  # => '123-4.56'

Note: You don't really need to escape the dot (.) either, so you could rewrite this as..

"ab BN123-4.56".scan(/BN([0-9_.-]+)/)    # => '123-4.56'

Or even the following if you choose to place it in the middle of the character class.

"ab BN123-4.56".scan(/BN([0-9.\-_]+)/)   # => '123-4.56'
like image 166
hwnd Avatar answered Sep 22 '22 10:09

hwnd


The hyphen is messing things up, not the underscore.

- is a special character inside a character class, indicating a range. One way to escape it is to put it at the beginning or the end of the class: [...-].

So [_.-] checks for a character, either _ or . or -.

And [.-_] check for a character, in the range "from . to _".

Illustration

BN([0-9.\-_]+) does what you expect and selects 123-4.56 from ab BN123-4.56.

like image 43
Robin Avatar answered Sep 20 '22 10:09

Robin