Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp Scan results

Tags:

regex

ruby

does anybody knows why I am getting different results depending on the order of the patterns?

list1 = ["AA1", "AA2","AA", "AA+"]
list2 = ["AA1", "AA2","AA+", "AA"]
results1 = "somethin with AA+ in it".scan(Regexp.union(list1))
results2 = "somethin with AA+ in it".scan(Regexp.union(list2))

Results1 outputs "AA" Results2 outputs "AA+"

I may be misunderstandig how scan works, but I was expecting it to return every occurrence, hence both "AA" and "AA+". Also I don't get why the ouptut changes depending on the order of the strings used.

like image 495
Jack Avatar asked Feb 23 '26 21:02

Jack


1 Answers

In an alternation group in NFA regex, the first left-most branch "wins". See Alternation with The Vertical Bar or Pipe Symbol for a more detailed explanation.

The regexes you have are

Regex 1: (?-mix:AA1|AA2|AA|AA\+)
Regex 2: (?-mix:AA1|AA2|AA\+|AA)

If you use the first regex, you get AA because |AA| branch matches first, and the others are not tested against the input, the match is returned and the regex index advances.

The second regex yields AA+ because the |AA\+| matches first, and the match is returned, |AA is not even tested.

like image 93
Wiktor Stribiżew Avatar answered Feb 25 '26 13:02

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!