Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby regex .scan

Tags:

regex

ruby

I'm using Ruby's scan() method to find text in a particular format. I then output it into a string separated by commas. The text I'm trying to find would look like this:

AB_ABCD_123456

Here's the what I've come up with so far to find the above. It works fine:

text.scan(/.._...._[0-9][0-9][0-9][0-9][0-9][0-9]/)
puts text.uniq.sort.join(', ')

Now I need a regex that will find the above with or without a two-letter country designation at the end. For example, I would like to be able to find all three of the below:

AB_ABCD_123456
AB_ABCD_123456UK
AB_ABCD_123456DE

I know I could use two or three different scans to achieve my result, but I'm wondering if there's a way to get all three with one regex.

like image 771
michaelmichael Avatar asked Aug 05 '09 21:08

michaelmichael


3 Answers

 /.._...._\d{6}([A-Z]{2})?/
like image 29
Avdi Avatar answered Nov 22 '22 13:11

Avdi


Why not just use split?

"AB_ABCD_123456".split(/_/).join(',')

Handles the cases you listed without modification.

like image 33
ezpz Avatar answered Nov 22 '22 12:11

ezpz


/.._...._[0-9][0-9][0-9][0-9][0-9][0-9](?:[A-Z][A-Z])?/

You can also use {} to make the regex shorter:

/.{2}_.{4}_[0-9]{6}(?:[A-Z]{2})?/

Explanation: ? makes the preceding pattern optional. () groups expressions together (so ruby knows the ? applies to the two letters). The ?: after the opening ( makes the group non-capturing (capturing groups would change the values yielded by scan).

like image 155
sepp2k Avatar answered Nov 22 '22 13:11

sepp2k