Python: How to check if a unicode string contains a cased character?

Tags:

I'm doing a filter wherein I check if a unicode (utf-8 encoding) string contains no uppercase characters (in all languages). It's fine with me if the string doesn't contain any cased character at all.

For example: 'Hello!' will not pass the filter, but "!" should pass the filter, since "!" is not a cased character.

I planned to use the islower() method, but in the example above, "!".islower() will return False.

According to the Python Docs, "The python unicode method islower() returns True if the unicode string's cased characters are all lowercase and the string contained at least one cased character, otherwise, it returns False."

Since the method also returns False when the string doesn't contain any cased character, ie. "!", I want to do check if the string contains any cased character at all.

Something like this....

Click to copy

string = unicode("!@#$%^", 'utf-8')

#check first if it contains cased characters
if not contains_cased(string):
     return True

return string.islower():

Any suggestions for a contains_cased() function?

Or probably a different implementation approach?

Thanks!

966

asked Aug 18 '10 02:08

Albert

2 Answers

Here is the full scoop on Unicode character categories.

Letter categories include:

Click to copy

Ll -- lowercase
Lu -- uppercase
Lt -- titlecase
Lm -- modifier
Lo -- other

Note that Ll <-> islower(); similarly for Lu; (Lu or Lt) <-> istitle()

You may wish to read the complicated discussion on casing, which includes some discussion of Lm letters.

Blindly treating all "letters" as cased is demonstrably wrong. The Lo category includes 45301 codepoints in the BMP (counted using Python 2.6). A large chunk of these would be Hangul Syllables, CJK Ideographs, and other East Asian characters -- very hard to understand how they might be considered "cased".

You might like to consider an alternative definition, based on the (unspecified) behaviour of "cased characters" that you expect. Here's a simple first attempt:

Click to copy

>>> cased = lambda c: c.upper() != c or c.lower() != c
>>> sum(cased(unichr(i)) for i in xrange(65536))
1970
>>>

Interestingly there are 1216 x Ll and 937 x Lu, a total of 2153 ... scope for further investigation of what Ll and Lu really mean.

115

answered Oct 26 '22 01:10

John Machin

Click to copy

import unicodedata as ud

def contains_cased(u):
  return any(ud.category(c)[0] == 'L' for c in u)

answered Oct 25 '22 23:10

Alex Martelli

Related questions
                            
                                Plotting a geopandas dataframe using plotly
                            
                                Combine 2 string columns in pandas with different conditions in both columns
                            
                                Passing apache2 digest authentication information to a wsgi script run by mod_wsgi
                            
                                Fetching attachments from gmail via either python or php
                            
                                Google App Engine: Production versus Development Settings
                            
                                What is a good python-based Webshop Software? [closed]
                            
                                Remove the "Add" functionality in Django admin [duplicate]
                            
                                Running Python's IDLE in windows
                            
                                Stacking numpy recarrays without losing their recarrayness
                            
                                NLTK - how to find out what corpora are installed from within python?
                            
                                What's the easiest way to convert a list of hex byte strings to a list of hex integers?
                            
                                Sorting numbers in string format with Python [duplicate]
                            
                                I need to speed up a function. Should I use cython, ctypes, or something else?
                            
                                Case insensitive string columns in SQLAlchemy?
                            
                                class __init__ (not instance __init__)
                            
                                Database testing in python, postgresql
                            
                                List in a Python class shares the same object over 2 different instances?
                            
                                Can I put break points on background threads in Python?
                            
                                Extract Meta Keywords From Webpage?
                            
                                Set products in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: How to check if a unicode string contains a cased character?

Tags:

python

uppercase

lowercase

unicode

Albert

People also ask

2 Answers

John Machin

Alex Martelli

Recent Activity

Donate For Us