Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for first name

Tags:

c#

regex

I am quite new to regex thing and need regex for first name which satisfies following conditions:

  1. First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
  2. It must begin with letters.
  3. All other characters and numbers are not valid.
  4. Special characters and cannot be together (e.g. John’-s is not allowed)
  5. An alphabet should be present before and after the special characters and (e.g. John ‘s is not allowed)
  6. Two consecutive spaces are not allowed (e.g. Annia St is not allowed)

Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.

like image 676
SKS Avatar asked Jan 03 '23 04:01

SKS


1 Answers

Regexes are notoriously difficult to write and maintain.

One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.

Here is a regex that meets your requirements.

^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$

It is split down into the following parts:

1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.

2) (?<alphachars>[A-Za-z]) We allow more alpha chars.

3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.

4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.

You should use a testing tool when writing regexes, I'd recommend https://regex101.com/

You can see from the screenshot below how this regex performs.

Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.

enter image description here

You can use this link to run the regex https://regex101.com/r/O2wFfi/1/

Edit

I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.

For your example "Sam D'Joe", if we run the original regex, the following happens.

^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$

1) ^ matches the start of the string enter image description here

2) (?<firstchar>[A-Za-z]) matches the first character enter image description here

3) (?<alphachars>[A-Za-z]) matches every character up to the space enter image description here

4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char enter image description here

Matches consume the characters that they match

This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).

The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.

Each step actually looks like this:

1) ^ matches the start of the string enter image description here

2) (?<firstchar>[A-Za-z]) matches the first character enter image description here

3) (?<alphachars>[A-Za-z]) matches every character up to the space enter image description here

4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char enter image description here

and then we're left with the following enter image description here

We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".

Lookahead

Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!

The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.

We can fix our regex, by using lookaheads.

1) ^ matches the start of the string enter image description here

2) (?<firstchar>[A-Za-z]) matches the first character enter image description here

3) (?<alphachars>[A-Za-z]) matches every character up to the space enter image description here

4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it. enter image description here

5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char. enter image description here

6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line. enter image description here

I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.

^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$

like image 183
Doctor Jones Avatar answered Jan 12 '23 15:01

Doctor Jones