Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex repetition aa,bb,cc

Tags:

regex

ruby

I expected the following to work (and it does):

x = '"aa","bb","cc"'

x =~ /\A(".*?",){2}".*?"\Z/
#=> 0 

...but I did not expect the following two to work (and don't want them to work). I purposely used ? to make .* non-greedy:

x =~ /\A(".*?",){0}".*?"\Z/
#=> 0 

x =~ /\A(".*?",){1}".*?"\Z/
#=> 0 

I expect: beginning of line (\A), followed by "aa",, followed by "bb", (that's two matches now, i.e. {2}), and then "cc", and the end of line \Z.

I understand why they are working, but I want to understand how to achieve what I want...

I want it to fail on the last two examples above (but it doesn't). Put another way, I want the following to fail:

x = '"aa","bb","cc","dd"'

x =~ /\A(".*?",){2}".*?"\Z/
#=> 0 

It should see: \A, "aa", "bb", "cc" and then FAIL on the subsequent , (the fact that it was not \Z).

like image 883
user664833 Avatar asked Jul 11 '19 05:07

user664833


1 Answers

The problem is that . is too generic, and that even a non-greedy .*? will match , or ":

'"aa","bb","cc"'.match(/\A(".*?",){1}(".*?")\Z/).captures
#=> ["\"aa\",", "\"bb\",\"cc\""]

Also, there is no difference between a greedy and a non-greedy match if they both need to continue until the end of the string. /.*\Z/ is the same as /.*?\Z/.

You cannot remove \Z so you could replace . with [^"] to avoid matching ".

three = '"aa","bb","cc"'
four = '"aa","bb","cc","dd"'

pattern = /\A("[^"]*",){2}"[^"]*"\Z/

(three =~ pattern) && (four !~ pattern)
#=> true

regex on debuggex

If the regex becomes too unreadable, an alternative would be to try to parse your text as a JSON array:

require 'json'
three = '"aa","bb","cc"'
four = '"aa","bb","cc","dd"'

def has_n_strings?(text, n)
  words = JSON.parse("[#{text}]")
  words.all?(String) && words.size == n
end

puts has_n_strings?(three, 3)
# true
puts has_n_strings?(three, 4)
# false
puts has_n_strings?(four, 4)
# true
puts has_n_strings?(four, 3)
# false
like image 86
Eric Duminil Avatar answered Sep 24 '22 14:09

Eric Duminil