I have the following:.
[11] pry(main)> "ab BN123-4.56".scan(/BN([0-9_\.-]+)/)
=> [["123-4.56"]]
[12] pry(main)> "ab BN123-4.56".scan(/BN([0-9\.-_]+)/)
=> [["123"]]
I am unsure why the second one with the the underscore at the end behaves differently than the first. How is it being interpreted by RegEx parser to make it different?
It's because you have the hyphen (-) placed in the middle of the character class without being escaped.
Within a character class [], you can place a hyphen (-) as the first or last character. If you place the hyphen anywhere else you need to escape it (\-) in order to be matched.
"ab BN123-4.56".scan(/BN([0-9_\.-]+)/)   # => '123-4.56'
"ab BN123-4.56".scan(/BN([0-9\.\-_]+)/)  # => '123-4.56'
Note: You don't really need to escape the dot (.) either, so you could rewrite this as..
"ab BN123-4.56".scan(/BN([0-9_.-]+)/)    # => '123-4.56'
Or even the following if you choose to place it in the middle of the character class.
"ab BN123-4.56".scan(/BN([0-9.\-_]+)/)   # => '123-4.56'
The hyphen is messing things up, not the underscore.
- is a special character inside a character class, indicating a range. One way to escape it is to put it at the beginning or the end of the class: [...-].
So [_.-] checks for a character, either _ or . or -.
And [.-_] check for a character, in the range "from . to _".
Illustration
BN([0-9.\-_]+) does what you expect and selects 123-4.56 from ab BN123-4.56.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With