Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string encoding for a variable

I'm aware of the fact that for Python < 3, unicode encoding for the string 'Plants vs. Zombies䋢 2' is as below:

u"Plants vs. Zombies䋢 2".encode("utf-8")

What if I have an variable (say appName) instead of a string can I do it like this:

  appName = "Plants vs. Zombies䋢 2"
 u+appName.encode("utf-8")

For:

 appName = appName.encode('utf-8');


 'ascii' codec can't decode byte 0xe4 in position 18: ordinal not in range(128)
like image 796
Siddharthan Asokan Avatar asked Nov 25 '13 21:11

Siddharthan Asokan


People also ask

How do you encode a string variable in Python?

String literals can be enclosed by either double or single quotes, although single quotes are more commonly used. Backslash escapes work the usual way within both single and double quoted literals -- e.g. \n \' \".

What encoding does Python use for strings?

1. Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

How do I encode a string in Python 3?

Python 3 - String encode() Method The encode() method returns an encoded version of the string. Default encoding is the current default string encoding. The errors may be given to set a different error handling scheme.


2 Answers

No. The u notation is only for string literals. Variables containing string data don't need the u, because the variable contains an object that is either a unicode string or a byte string. (I'm assuming here that appName contains string data; if it doesn't, it doesn't make sense to try to encode it. Convert it to a bytestring or unicode first.)

So your variable either contains a unicode string or a byte string. If it is a unicode string you can just do appName.encode("utf-8").

If it is a byte string then it is already encoded with some encoding. If it's already encoded as UTF-8, then it's already how you want it and you don't need to do anything. If it's in some other encoding and you want to get it into UTF-8, you can do appName.decode('the-existing-encoding').encode("utf-8").

Note that if you do what you show in your edited, question, the result might not be what you expect. You have:

appName = "Plants vs. Zombies䋢 2"

Without the u on the string literal, you have created a bytestring in some encoding, namely the encoding of your source file. If your source file isn't in UTF-8, then you're in the last situation I described above. There is no way to "just make a string unicode" after you have created it as non-unicode. When you create it as non-unicode, you are creating it in a particular encoding, and you have to know what encoding that is in order to decode it to unicode (so you can then encode it to another encoding if you want).

like image 115
BrenBarn Avatar answered Sep 28 '22 17:09

BrenBarn


No. the u prefix modifies the meaning of a string constant (making it a unicode constant). It is not an operator (which could be applied to any expression).

like image 33
greggo Avatar answered Sep 28 '22 17:09

greggo