Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python isalpha() and scandics

Is there a way to have python isalpha method understand scandics? I have tried the following:

>>> import locale
>>> locale.getlocale()
(None, None)
>>> 'thisistext'.isalpha()
True
>>> 'äöå'.isalpha()
False
>>> locale.setlocale(locale.LC_ALL,"")
'Finnish_Finland.1252'
>>> locale.getlocale()
('Finnish_Finland', '1252')
>>> 'äöå'.isalpha()
False
like image 363
user250765 Avatar asked Dec 10 '22 12:12

user250765


2 Answers

Simplest way is to use unicode strings if it's okay in your case. Just put 'u' symbol before string:

>>> u'привіт'.isalpha()
True

Or this line as first at the file:

# -*- coding: utf-8 -*-
like image 197
Oleksandr Kravchuk Avatar answered Dec 19 '22 09:12

Oleksandr Kravchuk


It looks like what you have in your string constant is NOT a byte string encoded in cp1252, which is what is required to make str.isalpha work properly in your locale. You don't say in what environment you typed that. I can tell from the way that locale responds that you are on Windows; perhaps you are getting UTF-8 from some IDE or cp850 from a Command Prompt window.

What you see on your screen is often of very little help in debugging. What you see is NOT what you have got. The repr built-in function is (or wants to be) your friend. It will show unambiguously in ASCII what you actually have. [Python 3: repr is renamed ascii and there is a new repr which is not what you want]

Try typing s = "your string constant with 'accented' letters" then print repr(s) and edit your question to show the results (copy/paste, don't retype). Also say what Python version you are using.

Another would-be pal is `unicodedata.name' ... see below.

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'Finnish')
'Finnish_Finland.1252'
>>> s = '\xe4\xf6\xe5'
>>> import unicodedata
>>> for c in s:
...     u = c.decode('1252')
...     print repr(c), repr(u), unicodedata.name(u, '<no name>')
...
'\xe4' u'\xe4' LATIN SMALL LETTER A WITH DIAERESIS
'\xf6' u'\xf6' LATIN SMALL LETTER O WITH DIAERESIS
'\xe5' u'\xe5' LATIN SMALL LETTER A WITH RING ABOVE
>>> s.isalpha()
True

You can compare the above results with this chart.

like image 40
John Machin Avatar answered Dec 19 '22 09:12

John Machin