Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx for name: Any language but first letter must be capital

Tags:

regex

I have a requirement to accept a first name as input and check that the first letter is caps and that there can be 1 space after the end of the string.

This RegEx works for 'Bob ':

^[A-Z][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$

An extra requirement is then to allow any language / character which then involves allowing unicode.

This RegEx works for a russian name: 'Афанасий'

^[A-Z\p{L}][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$

... However, while it allows for unicode, it also allows me to enter 'bob' with a small first letter and the RegEx allows this through.

Is there any way to allow both unicode and still flag up the first letter when it is not capital? ( Using a RegEx)

I could make some code changes to get round this issue but it would be nice to be able to keep it all in the RegEx value without making code changes.

like image 615
Kev Avatar asked Sep 20 '16 09:09

Kev


People also ask

How do I get the first letter of a capital in regex?

Example 2: Convert First letter to UpperCase using Regex The regex pattern is /^./ matches the first character of a string. The toUpperCase() method converts the string to uppercase.

How do you check if the first letter of a string is uppercase?

Using isupper() method One way to achieve this is using the inbuilt string method isupper(). We should access the first letter of the string using indexing and then send the character to isupper() method, this method returns True if the given character is Capital otherwise, it returns False.

How do you make the first letter of each word capital in Python?

capwords() capwords() is a python function that converts the first letter of every word into uppercase and every other letter into lowercase. The function takes the string as the parameter value and then returns the string with the first letter capital as the desired output.

Is regex language specific?

Short answer: yes.


1 Answers

Any Unicode uppercase letter can be matched with \p{Lu}.

Use

^\p{Lu}\p{L}+[\s,.'\-]?\p{L}*$

or

^\p{Lu}\p{L}+(?:[\s,.'-]\p{L}+)?$

See the regex demo 1 and regex demo 2. The second regex is more precise as it won't allow trailing whitespace, comma, etc. (what is defined in the [\s,.'-] character class).

Note that there is no point in using [A-Za-z\p{L}] since \p{L} already matches [a-zA-Z].

Pattern details:

  • ^ - start of string
  • \p{Lu} - an uppercase Unicode letter
  • \p{L}+ - one or more Unicode characters
  • (?:[\s,.'-]\p{L}+)? - one or zero (optional) sequence of
    • [\s,.'-] - a whitespace, ,, ., ' or a hyphen
    • \p{L}+ - 1 or more Unicode letters
  • $ - end of string.
like image 157
Wiktor Stribiżew Avatar answered Nov 15 '22 03:11

Wiktor Stribiżew