Detect strings with non English characters in Python

Tags:

I have some strings that have a mix of English and none English letters. For example:

w='_1991_اف_جي2'

How can I recognize these types of string using Regex or any other fast method in Python?

I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.

239

asked Nov 23 '14 01:11

TJ1

1 Answers

You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.

Note the comment # -*- coding: ..... It should be there at the top of the python file (otherwise you would receive some error about encoding)

# -*- coding: utf-8 -*- def isEnglish(s):     try:         s.encode(encoding='utf-8').decode('ascii')     except UnicodeDecodeError:         return False     else:         return True  assert not isEnglish('slabiky, ale liší se podle významu') assert isEnglish('English') assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ') assert not isEnglish('how about this one : 通 asfަ') assert isEnglish('?fd4))45s&')

141

answered Oct 05 '22 08:10

Salvador Dali

Related questions
                            
                                How to use avg and sum in SQLAlchemy query
                            
                                How do I generate circular thumbnails with PIL?
                            
                                Using more than one flag in python re.findall
                            
                                Tensorflow Data Adapter Error: ValueError: Failed to find data adapter that can handle input
                            
                                SQLAlchemy boolean value is None
                            
                                pip: Could not find an activated virtualenv (required)
                            
                                KeyError: 'TCL_Library' when I use cx_Freeze
                            
                                Pandas - Strip white space
                            
                                UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte
                            
                                Remove the newline character in a list read from a file [duplicate]
                            
                                Kivy: How to change window size?
                            
                                List of objects to JSON with Python
                            
                                One Hot Encoding using numpy [duplicate]
                            
                                How to reverse order of keys in python dict?
                            
                                Variance Inflation Factor in Python
                            
                                How to find whether a number belongs to a particular range in Python? [duplicate]
                            
                                What does __contains__ do, what can call __contains__ function
                            
                                How to remove numbers from string terms in a pandas dataframe
                            
                                Problems using nose in a virtualenv
                            
                                getting the row and column numbers from coordinate value in openpyxl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detect strings with non English characters in Python

Tags:

python

regex

non-english

TJ1

People also ask

1 Answers

Salvador Dali

Recent Activity

Donate For Us