Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grails/Groovy regular expression- how to use (?i) to make everything case insensitive?

I use the following RegEx:

url (blank:false, matches: /^(https?:\/\/)(?:[A-Za-z0-9]+([\-\.][A-Za-z0-9]+)*\.)+[A-Za-z]{2,40}(:[1-9][0-9]{0,4})?(\/\S*)?/)

I want to add (?i) to make everything case insensitive. How should I add this?

like image 471
Sarit Rotshild Avatar asked Dec 08 '15 09:12

Sarit Rotshild


1 Answers

I can confirm the (?i) at the beginning of the regex makes it case insensitive.

Anyway, if your purpose is to reduce the regex length you can use the groovy dollar slashy string form. It allows you to not escape slashes / (the escape char becomes $).

In addition:

  • the POSIX chars \p{Alnum} is the compact equivalent of [0-9a-zA-Z] (this way you can avoid to use the (?i) at all).

  • remove unneeded backslashed dash from char class [\-\.] -> [-.] (it's not mandatory when the dash is the first or the last element and also the dot is always literal inside a character group).

  • remove unneeded round brackets from the protocol section

In the following version I take advantage of the multiline support of dollar slashy string and the free-spacing regex flag (?x):

$/(?x)
  ^                      # start of the string
  https?://              # http:// or https://, no need of round brackets
  (                      # start group 1, have to be a non capturing (?: ... ) but is less readable
    \p{Alnum}+           # one or more alphanumeric char instead of [a-zA-Z0-9]
    ([.-]\p{Alnum}+)*    # zero or more of (literal dot or dash followed by one or more [a-zA-Z0-9])
    \.                   # a literal dot
  )+                     # repeat the group 1 one or more
  \p{Alpha}{2,40}        # between 2 and 40 alphabetic chars [a-zA-Z]
  (:[1-9][0-9]{0,4})?    # [optional] a literal colon ':' followed by at least one non zero digit till 5 digits
  (/\S*)?                # [optional] a literal slash '/' followed by zero or more non-space chars
/$

A dollar-slashy compact version:

$/^https?://(\p{Alnum}+([.-]\p{Alnum}+)*\.)+\p{Alpha}{2,40}([1-9][0-9]{0,4})?(/\S*)?/$

If you must use the slashy version this is an equivalent:

/^https?:\/\/(?:\p{Alnum}+([.-]\p{Alnum}+)*\.)+\p{Alpha}{2,40}(:[1-9][0-9]{0,4})?(\/\S*)?/

A snippet of code to test all these regex:

def multiline_pattern = $/(?x)
  ^                      # start of the string
  https?://              # http:// or https://, no need of round bracket
  (                      # start group 1, have to be a non capturing (?: ... ) but is less readable
    \p{Alnum}+           # one or more alphanumeric char, instead of [a-zA-Z0-9]
    ([.-]\p{Alnum}+)*    # zero or more of (literal dot or dash followed by one or more [0-9a-zA-Z])
    \.                   # a literal dot
  )+                     # repeat the group 1 one or more
  \p{Alpha}{2,40}        # between 2 and 40 alphabetic chars [a-zA-Z]
  (:[1-9][0-9]{0,4})?    # [optional] a literal colon ':' followed by at least one non zero digit till 5 digits
  (/\S*)?                # [optional] a literal slash '/' followed by zero or more non-space chars
/$

def compact_pattern = $/^https?://(\p{Alnum}+([.-]\p{Alnum}+)*\.)+\p{Alpha}{2,40}(:[1-9][0-9]{0,4})?(/\S*)?/$

def slashy_pattern  = /^https?:\/\/(?:\p{Alnum}+([.-]\p{Alnum}+)*\.)+\p{Alpha}{2,40}(:[1-9][0-9]{0,4})?(\/\S*)?/

def url1    = 'https://www.example-test.domain.com:12344/aloha/index.html'
def notUrl1 = 'htxps://www.example-test.domain.com:12344/aloha/index.html'
def notUrl2 = 'https://www.example-test.domain.com:02344/aloha/index.html'

assert url1 ==~ multiline_pattern
assert url1 ==~ compact_pattern
assert url1 ==~ slashy_pattern

assert !( notUrl1 ==~ compact_pattern )
assert !( notUrl1 ==~ slashy_pattern  )
assert !( notUrl1 ==~ slashy_pattern  )

assert !( notUrl2 ==~ compact_pattern )
assert !( notUrl2 ==~ slashy_pattern  )
assert !( notUrl2 ==~ slashy_pattern  )
like image 149
Giuseppe Ricupero Avatar answered Oct 05 '22 10:10

Giuseppe Ricupero