I'm trying to match a SEDOL (exactly 7 chars: 6 alpha-numeric chars followed by 1 numeric char)
My regex
([A-Z0-9]{6})[0-9]{1}
matches correctly but strings greater than 7 chars that begin with a valid match also match (if you see what I mean :)). For example:
B3KMJP4
matches correctly but so does:
B3KMJP4x
which shouldn't match.
Can anyone show me how to avoid this?
The ‹ ^ › and ‹ $ › anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The ‹ [A-Z] › character class matches any single uppercase character from A to Z, and the interval quantifier ‹ {1,10} › repeats the character class from 1 to 10 times.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
Dollar sign at the end of the regex (called an anchor) signifies end of string:
^([A-Z0-9]{6})\d$
I also added "^" at the start which signifies start of string and prevents matching xB3KMJP4 I also simplified the original regex.
By the way, as per Wikipedia, for the first character, vowels are not used. I'm not quite sure if that's a rule or a convention.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With