How can I determine the byte length of a utf-8 encoded string in Python?

Question

I am working with Amazon S3 uploads and am having trouble with key names being too long. S3 limits the length of the key by bytes, not characters.

From the docs:

The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long.

I also attempt to embed metadata in the file name, so I need to be able to calculate the current byte length of the string using Python to make sure the metadata does not make the key too long (in which case I would have to use a separate metadata file).

How can I determine the byte length of the utf-8 encoded string? Again, I am not interested in the character length... rather the actual byte length used to store the string.

Dietrich Epp · Accepted Answer

def utf8len(s):
    return len(s.encode('utf-8'))

Works fine in Python 2 and 3.

Mark Reed · Answer

Use the string 'encode' method to convert from a character-string to a byte-string, then use len() like normal:

>>> s = u"¡Hola, mundo!"                                                      
>>> len(s)                                                                    
13 # characters                                                                             
>>> len(s.encode('utf-8'))   
14 # bytes

How can I determine the byte length of a utf-8 encoded string in Python?

Tags:

python

unicode

utf-8

user319862

2 Answers

Dietrich Epp

Mark Reed

Recent Activity

Donate For Us

How can I determine the byte length of a utf-8 encoded string in Python?

Tags:

python

unicode

utf-8

user319862

2 Answers

Dietrich Epp

Mark Reed

Related questions

Recent Activity

Donate For Us