I have some strings that have a mix of English and none English letters. For example:
w='_1991_اف_جي2'
How can I recognize these types of string using Regex or any other fast method in Python?
I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.
To check if a string contains only alphabets, use the function isalpha() on the string. isalpha() returns boolean value. The return value is True if the string contains only alphabets and False if not.
Python String isalpha() method is used to check whether all characters in the String is an alphabet.
You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.
Note the comment # -*- coding: ....
. It should be there at the top of the python file (otherwise you would receive some error about encoding)
# -*- coding: utf-8 -*- def isEnglish(s): try: s.encode(encoding='utf-8').decode('ascii') except UnicodeDecodeError: return False else: return True assert not isEnglish('slabiky, ale liší se podle významu') assert isEnglish('English') assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ') assert not isEnglish('how about this one : 通 asfަ') assert isEnglish('?fd4))45s&')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With