Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find out which chars are defined as alphanumeric for a given locale

So with python regex matching, we have the meaning of \w and others affected by the re.LOCALE flag:

\w

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale.

So we want to find out what characters are defined as alphanumeric for a given locale - say we did 'locale -a' and we have a list of locales, and want to find the info for one of the listed locales on the system. Any method to find the info quickly: a python code snippet or one-liner, shell command or maybe reference material somewhere.

like image 947
Basel Shishani Avatar asked Mar 11 '12 04:03

Basel Shishani


People also ask

How do you know if a character is alphanumeric?

The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9). Example of characters that are not alphanumeric: (space)!

What is the difference between alphanumeric and character?

Alphanumeric, also referred to as alphameric, is a term that encompasses all of the letters and numerals in a given language set. In layouts designed for English language users, alphanumeric characters are those comprised of the combined set of the 26 alphabetic characters, A to Z, and the 10 Arabic numerals, 0 to 9.

How do you check if a string is alphanumeric in PHP?

PHP | ctype_alnum() (Check for Alphanumeric) A ctype_alnum() function in PHP used to check all characters of given string/text are alphanumeric or not. If all characters are alphanumeric then return TRUE, otherwise return FALSE.

Are alphanumeric characters accented?

The alphanumeric alphabet does not contain accented letters and consists of 26 uppercase letters -- A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y and Z -- and 26 lower case letters -- a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y and z.


1 Answers

Use string.letters.

Example:

>>> import locale
>>> import string
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
>>> locale.setlocale(locale.LC_ALL, 'de_DE')
'de_DE'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> 
like image 162
torek Avatar answered Oct 26 '22 09:10

torek