This works
s = 'jiā'
s.find(u'\u0101')
How do I do something like this:
s = 'jiā'
zzz = '\u0101'
s.find(zzz)
Since I'm using a variable now, how do I indicate the string represented by the variable is Unicode?
In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.
To check if a given String contains only unicode letters, digits or space, we use the isLetterOrDigit() and charAt() methods with decision making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.
String find() in Python Just call the method on the string object to search for a string, like so: obj. find(“search”). The find() method searches for a query string and returns the character position if found. If the string is not found, it returns -1.
Since I'm using a variable now, how do I indicate the string represented by the variable is Unicode?
By defining it as a Unicode string in the first place.
zzz = u"foo"
Or, if you already have a string in some other encoding, by converting it to Unicode (the original encoding must be specified if the string is non-ASCII).
zzz = unicode(zzz, encoding="latin1")
Or by using Python 3 where all strings are Unicode.
zzz
as defined in your post is a plain str
object, not a unicode
object, so there is no way to indicate that it is something it actually isn't. You can convert the str
object to a unicode
object, though, by specifying an encoding:
s.find(zzz.decode("utf-8"))
Substitue utf-8
by whatever encoding the string is encoded in.
Note that in your example
zzz = '\u0101'
zzz
is a plain string of length 6. There is no easy way to fix this wrong string literal afterwards, except for hacks along the lines of
ast.literal_eval("u'" + zzz + "'")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With