I am quite new to regex thing and need regex for first name which satisfies following conditions:
‘
and –
cannot be together (e.g. John’-s is not allowed)‘
and –
(e.g. John ‘s is not allowed)Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$
but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z]))
This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z])
We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z]))
We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z]))
We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^
matches the start of the string
2) (?<firstchar>[A-Za-z])
matches the first character
3) (?<alphachars>[A-Za-z])
matches every character up to the space
4) (?<spaces> [A-Za-z])
matches the space and the subsequent alpha char
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))
).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^
matches the start of the string
2) (?<firstchar>[A-Za-z])
matches the first character
3) (?<alphachars>[A-Za-z])
matches every character up to the space
4) (?<spaces> [A-Za-z])
matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?=
followed by what you want to match. E.g. ?=[A-Z]
would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^
matches the start of the string
2) (?<firstchar>[A-Za-z])
matches the first character
3) (?<alphachars>[A-Za-z])
matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z])
to (?<spaces> ?=[A-Za-z])
. This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z])
matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With