I want to be able to add a 'u' to a referenced string variable. I need to do this because when I am in a for loop, i can only access the string by a variable name.
Is there a way to do this?
>>> word = 'blahblah'
>>> list = ['blahblah', 'boy', 'cool']
>>> import marisa_trie
>>> trie = marisa_trie.Trie(list)
>>> word in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Argument 'key' has incorrect type (expected unicode, got str)
>>> 'blahblah' in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Argument 'key' has incorrect type (expected unicode, got str)
>>> u'blahblah' in trie
True
>>> u"blahblah" in trie
True
>>> u(word) in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'u' is not defined
>>> uword in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'uword' is not defined
>>> u+word in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'u' is not defined
>>> word.u in trie
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'u'
To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.
Inserting Unicode characters To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.
u'\xe9' is a Unicode string that contains the unicode character U+00E9 (LATIN SMALL LETTER E WITH ACUTE). References: From this link.
If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised.
You could decode:
lst = ['blahblah', 'boy', 'cool']
for word in lst:
print(type(word.decode("utf-8")))
Or use the unicode function:
unicode(word,encoding="utf-8"))
Or str.format:
for word in lst:
print(type(u"{}".format(word)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With