Get the number of bytes needed for a Unicode string

Question

I have a Korean string encoded as Unicode like u'정정'. How do I know how many bytes are needed to represent this string?

I need to know the exact byte count since I'm using the string for iOS push notification and it has a limit on the size of the payload.

len('정정') doesn't work because that returns the number of characters, not the number of bytes.

Martijn Pieters · Accepted Answer

You need to know what encoding you want to measure your byte size in:

>>> print u'\uC815\uC815'
정정
>>> print len(u'\uC815\uC815')
2
>>> print len(u'\uC815\uC815'.encode('UTF-8'))
6
>>> print len(u'\uC815\uC815'.encode('UTF-16-LE'))
4
>>> print len(u'\uC815\uC815'.encode('UTF-16'))
6
>>> print len(u'\uC815\uC815'.encode('UTF-32-LE'))
8
>>> print len(u'\uC815\uC815'.encode('UTF-32'))
12

You really want to review the Python Unicode HOWTO to fully appreciate the difference between a unicode object and it's byte encoding.

Another excellent article is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), by Joel Spolsky (one of the people behind Stack Overflow).

Get the number of bytes needed for a Unicode string

Tags:

python

string

unicode

cjk

jasondinh

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Get the number of bytes needed for a Unicode string

Tags:

python

string

unicode

cjk

jasondinh

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us