Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby regex to allow A-Za-z0-9

Tags:

regex

ruby

I have the following regex:

/([A-Za-z0-9]+)([A-Za-z0-9\-\_]+)([A-Za-z0-9]+)/

It is not working according to my needs, which are:

  • do not allow spaces
  • allow capital English letters
  • allow lowercased English letters
  • allow digits
  • the string may not contain both a hyphen and an underscore
  • hyphen: hyphen cannot be at the beginning or at the end of the string; There can be any amount of hyphens but consecutively there can be only 1 hyphen (a--b is invalid).
  • underscores: underscore cannot be at the beginning or at the end of the string; There can be any amount of underscores but consecutively there can be only 1 underscore (a__b is invalid)
  • the string must contain at least 1 character (letter)

Valid examples:

  • a1_b_2_hello
  • 2b-ffg-er2
  • abs
  • 123a

Invalid examples:

  • _a1_b_2_hello
  • 2b-ffg_er2-
  • __
  • --
  • a__
  • b--2
like image 656
Andrey Deineko Avatar asked Jan 25 '23 23:01

Andrey Deineko


2 Answers

I find it convenient to put all the special conditions at the beginning in positive and negative lookaheads and follow these (which consume no characters) with the general requirement, here [a-z\d_-]+\z.

r = /
    \A           # match start of string  
    (?!.*        # begin negative lookahead and match >= 0 characters
      (?:--|__)  # match -- or __ in a non-capture group
    )            # end negative lookahead
    (?![-_])     # do not match - or _ at the beginning of the string
    (?!.*[-_]\z) # do not match - or _ at the end of the string
    (?!          # begin negative lookahead
      .*-.*_     # match - followed by _ 
      |          # or
      .*_.*-     # match _ followed by - 
    )            # end negative lookahead
    (?=.*[a-z])  # match at least one letter 
    [a-z\d_-]+   # match one or more English letters, digits, _ or -
    \z           # match end of string
    /ix          # case indifference and free-spacing modes

 "a".match? r          #=> true   
 "aB32-41".match? r    #=> true
 "".match? r           #=> false (must match a letter)
 "123-4_5".match? r    #=> false (must match a letter)
 "-aB32-4_1".match? r  #=> false (cannot begin with -)
 "aB32-4_1-".match? r  #=> false (cannot end with -)
 "_aB32-4_1".match? r  #=> false (cannot begin with _)
 "aB32-4_1_".match? r  #=> false (cannot end with _)
 "aB32--4_1".match? r  #=> false (cannot contain --)
 "aB32-4__1".match? r  #=> false (cannot contain __)
 "aB32-4_1".match? r   #=> false (cannot contain both - and _)
 "123-4_5$".match?  r  #=> false ($ is not a permitted character)

This regular expression is conventionally written:

/\A(?!.*(?:--|__))(?![-_])(?!.*[-_]\z)(?!.*-.*_|.*_.*-)(?=.*[a-z])[a-z\d_-]+\z/i
like image 165
Cary Swoveland Avatar answered Jan 29 '23 10:01

Cary Swoveland


You could add the a-zA-Z in a character class, and in the repetition of 0+ times match either a hyphen or an underscore [-_] followed by 1+ times what is listed in the character class [A-Za-z0-9]+.

Use a capturing group with a backreference to get a consistent using of - or _

\A[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*(?:([-_])[A-Za-z0-9]+(?:\1[A-Za-z0-9]+)*)?\z

About the pattern

  • \A Start of string
  • [A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* Match at least 1 a-zA-Z
  • (?: Non capturing group
    • ([-_]) Capturing group 1, match either - or _
    • [A-Za-z0-9]+ Match 1+ times what is listed
    • (?:
      • \1[A-Za-z0-9]+ Backreference \1 to what is captured in group 1 to get consistent delimiters (to prevent matching a-b_c) and match 1+ times what is listed
    • )*Close non capturing group and make it optional
  • )? Close non capturing group and make it optional
  • \z End of string

Regex demo

See this page for a detailed explanation about the anchors.

like image 30
The fourth bird Avatar answered Jan 29 '23 10:01

The fourth bird