Allowed characters are (at least) A-Z, a-z, 0-9, ö, Ö, ä, ä, å, Å and german, latvian, estonian (if any) special chars? Is there ready-made method or do i have to make blacklist (non-allowed chars) and regular expressions IsMatch? If no ready-made how to use blacklist?
*[^a-zA-Z0-9]. *$ tests for any character other than a-z, A-Z and 0-9. Thus if it finds any character other than these, it returns true(means non-alphanumeric character).
Python string isalnum() function returns True if it's made of alphanumeric characters only. A character is alphanumeric if it's either an alpha or a number. If the string is empty, then isalnum() returns False .
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
Non-Alphanumeric characters are the other characters on your keyboard that aren't letters or numbers, e.g. commas, brackets, space, asterisk and so on. Any character that is not a number or letter (in upper or lower case) is non-alphanumeric.
I don't know how special characters from all those languages are categorised, but you could check if the Char.IsLetterOrDigit
method matches what you want to do. It works at least for the digits and letters I tested:
string test = "Aasdf345ÅÄÖåäöéÉóÓüÜïÏôÔ";
if (test.All(Char.IsLetterOrDigit)) { ... }
The Char.IsLetterOrDigit
returns true for characters that are categorised in Unicode as UppercaseLetter, LowercaseLetter, TitlecaseLetter, ModifierLetter, OtherLetter, or DecimalDigitNumber.
Investigate char.IsLetterOrDigit(char)
.
For example:
myString.All(c => char.IsLetterOrDigit(c));
A blacklist for characters is likely pretty large :-)
You can use the regular expression
^[\d\p{L}]+$
to match decimal digits and letters, regardless of script.
This regular expression consists of a character class containing the shorthands \d
– which contains every digit (230 in total in the BMP) and \p{L}
which contains every Unicode character classified as a "letter" (46817 in the BMP). Said character class is then repeated at least once and embedded between ^
and $
– the string start and end anchors, so it matches the complete string.
For some regex engines, since you're only interested in Latin letters, apparently, you could also use
^[\d\p{Letter}]+$
However, .NET doesn't support this. The first regex mentioned above actually catches everything that's a digit or a letter in any script. So it will dutifully match on Indian or Arabic numerals and Hebrew, Cyrillic and other non-Latin scripts. Depending on what you want this may not be appropriate.
If that poses a problem, then I see no better option than to explicitly list the characters you want to allow. However, I consider it dangerous to assume that text in a certain language is always restricted to that language's script. If I were to write a Czech or Polish name in a German text, then I'd likely need more than just [a-zA-ZäöüÄÖÜß]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With