Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3 .title() of utf-8 strings

So I have a string:

amélie

In bytes it is b'ame\xcc\x81lie'

In utf-8 the character is combining acute accent for the previous character http://www.fileformat.info/info/unicode/char/0301/index.htm

u'ame\u0301lie'

When I do: 'amélie'.title() on that string, I get 'AméLie', which makes no sense to me.

I know I can do a workaround, but is this intended behavior or a bug? I would expect the "l" to NOT get capitalized.

another experiment:

  In [1]: [ord(c) for c in 'amélie'.title()]
  Out[1]: [65, 109, 101, 769, 76, 105, 101]

  In [2]: [ord(c) for c in 'amélie']
  Out[2]: [97, 109, 101, 769, 108, 105, 101]
like image 932
lqdc Avatar asked Sep 02 '15 04:09

lqdc


People also ask

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

What does encode () do in Python?

Definition and Usage. The encode() method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.

Is Python a UTF-8 string?

By default, Python uses utf-8 encoding.


1 Answers

Take a look at these questions: Python title() with apostrophes and Titlecasing a string with exceptions

Basically it looks like a limitation of the inbuilt title function which seems to be very liberal about what it considers a word boundary.

You can use string.capwords:

import string
string.capwords('amélie')
Out[18]: 'Amélie'

Another thing you could do is use the character é ('\xc3\xa9') which is an e with accent built in:

b'am\xc3\xa9lie'.decode().title()
Out[21]: 'Amélie'
like image 143
maxymoo Avatar answered Oct 11 '22 15:10

maxymoo