Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How can I do a string find on a Unicode character that is a variable?

Tags:

python

unicode

This works

s = 'jiā'
s.find(u'\u0101')

How do I do something like this:

s = 'jiā'
zzz = '\u0101'
s.find(zzz)

Since I'm using a variable now, how do I indicate the string represented by the variable is Unicode?

like image 255
Steve Avatar asked Nov 11 '11 16:11

Steve


People also ask

How do I find the Unicode value of a character in Python?

In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.

How do I check if a string contains Unicode characters?

To check if a given String contains only unicode letters, digits or space, we use the isLetterOrDigit() and charAt() methods with decision making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.

How do I find a specific character in a string Python?

String find() in Python Just call the method on the string object to search for a string, like so: obj. find(“search”). The find() method searches for a query string and returns the character position if found. If the string is not found, it returns -1.


2 Answers

Since I'm using a variable now, how do I indicate the string represented by the variable is Unicode?

By defining it as a Unicode string in the first place.

zzz = u"foo"

Or, if you already have a string in some other encoding, by converting it to Unicode (the original encoding must be specified if the string is non-ASCII).

zzz = unicode(zzz, encoding="latin1")

Or by using Python 3 where all strings are Unicode.

like image 166
kindall Avatar answered Oct 24 '22 17:10

kindall


zzz as defined in your post is a plain str object, not a unicode object, so there is no way to indicate that it is something it actually isn't. You can convert the str object to a unicode object, though, by specifying an encoding:

s.find(zzz.decode("utf-8"))

Substitue utf-8 by whatever encoding the string is encoded in.

Note that in your example

zzz = '\u0101'

zzz is a plain string of length 6. There is no easy way to fix this wrong string literal afterwards, except for hacks along the lines of

ast.literal_eval("u'" + zzz + "'")
like image 37
Sven Marnach Avatar answered Oct 24 '22 15:10

Sven Marnach