Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect strings with non English characters in Python

I have some strings that have a mix of English and none English letters. For example:

w='_1991_اف_جي2' 

How can I recognize these types of string using Regex or any other fast method in Python?

I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.

like image 239
TJ1 Avatar asked Nov 23 '14 01:11

TJ1


People also ask

How do I check if a string contains a non alphabet in Python?

To check if a string contains only alphabets, use the function isalpha() on the string. isalpha() returns boolean value. The return value is True if the string contains only alphabets and False if not.

How do you check if a letter is in English Python?

Python String isalpha() method is used to check whether all characters in the String is an alphabet.


1 Answers

You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.

Note the comment # -*- coding: ..... It should be there at the top of the python file (otherwise you would receive some error about encoding)

# -*- coding: utf-8 -*- def isEnglish(s):     try:         s.encode(encoding='utf-8').decode('ascii')     except UnicodeDecodeError:         return False     else:         return True  assert not isEnglish('slabiky, ale liší se podle významu') assert isEnglish('English') assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ') assert not isEnglish('how about this one : 通 asfަ') assert isEnglish('?fd4))45s&') 
like image 141
Salvador Dali Avatar answered Oct 05 '22 08:10

Salvador Dali