I have a requirement to accept a first name as input and check that the first letter is caps and that there can be 1 space after the end of the string.
This RegEx works for 'Bob ':
^[A-Z][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$
An extra requirement is then to allow any language / character which then involves allowing unicode.
This RegEx works for a russian name: 'Афанасий'
^[A-Z\p{L}][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$
... However, while it allows for unicode, it also allows me to enter 'bob' with a small first letter and the RegEx allows this through.
Is there any way to allow both unicode and still flag up the first letter when it is not capital? ( Using a RegEx)
I could make some code changes to get round this issue but it would be nice to be able to keep it all in the RegEx value without making code changes.
Example 2: Convert First letter to UpperCase using Regex The regex pattern is /^./ matches the first character of a string. The toUpperCase() method converts the string to uppercase.
Using isupper() method One way to achieve this is using the inbuilt string method isupper(). We should access the first letter of the string using indexing and then send the character to isupper() method, this method returns True if the given character is Capital otherwise, it returns False.
capwords() capwords() is a python function that converts the first letter of every word into uppercase and every other letter into lowercase. The function takes the string as the parameter value and then returns the string with the first letter capital as the desired output.
Short answer: yes.
Any Unicode uppercase letter can be matched with \p{Lu}
.
Use
^\p{Lu}\p{L}+[\s,.'\-]?\p{L}*$
or
^\p{Lu}\p{L}+(?:[\s,.'-]\p{L}+)?$
See the regex demo 1 and regex demo 2. The second regex is more precise as it won't allow trailing whitespace, comma, etc. (what is defined in the [\s,.'-]
character class).
Note that there is no point in using [A-Za-z\p{L}]
since \p{L}
already matches [a-zA-Z]
.
Pattern details:
^
- start of string\p{Lu}
- an uppercase Unicode letter\p{L}+
- one or more Unicode characters(?:[\s,.'-]\p{L}+)?
- one or zero (optional) sequence of
[\s,.'-]
- a whitespace, ,
, .
, '
or a hyphen\p{L}+
- 1 or more Unicode letters$
- end of string.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With