Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for [a-zA-Z0-9\-] with dashes allowed in between but not at the start or end

Tags:

python

regex

Update:

This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):

Solution:

r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'

Original Question (albeit, after 3 edits)

I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.

allowed values:

spam123-spam-eggs-eggs1
spam123-eggs123
spam
1234
eggs123

Not allowed values:

eggs1-
-spam123
spam--spam

I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.

Here's what I've come up with after some experimentation with lookbehind, etc:

# Nothing here
like image 872
orokusaki Avatar asked Mar 26 '10 17:03

orokusaki


People also ask

What does the regular expression a z0 9 \-] mean?

In a regular expression, if you have [a-z] then it matches any lowercase letter. [0-9] matches any digit. So if you have [a-z0-9], then it matches any lowercase letter or digit.

Do dashes need to be escaped in regex?

In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.

What is the use of \\ in regex?

You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does a za z regex mean?

For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.


2 Answers

Try this regular expression:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

This regular expression does only allow hyphens to separate sequences of one or more characters of [a-zA-Z0-9].


Edit    Following up your comment: The expression (…)* allows the part inside the group to be repeated zero or more times. That means

a(bc)*

is the same as

a|abc|abcbc|abcbcbc|abcbcbcbc|…

Edit    Now that you changed the requirements: As you probably don’t want to restrict each hyphen separated part of the words in its length, you will need a look-ahead assertion to take the length into account:

(?=[a-zA-Z0-9-]{4,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
like image 146
Gumbo Avatar answered Sep 21 '22 14:09

Gumbo


The current regex is simple and fairly readable. Rather than making it long and complicated, have you considered applying the other constraints with normal Python string processing tools?

import re

def fits_pattern(string):
    if (4 <= len(string) <= 25 and
        "--" not in string and
        not string.startswith("-") and
        not string.endswith("-")):

        return re.match(r"[a-zA-Z0-9\-]", string)
    else:
        return None
like image 42
Mike Graham Avatar answered Sep 18 '22 14:09

Mike Graham