So I have a string: amélie In bytes it is <code>b'ame\xcc\x81lie'</code> In utf-8 the character is combining acute accent for the previous character http://www.fileformat.info/info/unicode/char/0301/index.htm <code>u'ame\u0301lie'</code> When I do: 'amélie'.title() on that string, I get 'AméLie', which makes no sense to me. I know I can do a workaround, but is this intended behavior or a bug? I would expect the "l" to NOT get capitalized. another experiment: <pre class="prettyprint"><code> In [1]: [ord(c) for c in 'amélie'.title()] Out[1]: [65, 109, 101, 769, 76, 105, 101] In [2]: [ord(c) for c in 'amélie'] Out[2]: [97, 109, 101, 769, 108, 105, 101] </code></pre>

Take a look at these questions: Python title() with apostrophes and Titlecasing a string with exceptions Basically it looks like a limitation of the inbuilt <code>title</code> function which seems to be very liberal about what it considers a word boundary. You can use <code>string.capwords</code>: <pre class="prettyprint"><code>import string string.capwords('amélie') Out[18]: 'Amélie' </code></pre> Another thing you could do is use the character é (<code>'\xc3\xa9'</code>) which is an <code>e</code> with accent built in: <pre class="prettyprint"><code>b'am\xc3\xa9lie'.decode().title() Out[21]: 'Amélie' </code></pre>

Python3 .title() of utf-8 strings

Tags:

python

python-3.x

So I have a string:

amélie

In bytes it is b'ame\xcc\x81lie'

In utf-8 the character is combining acute accent for the previous character http://www.fileformat.info/info/unicode/char/0301/index.htm

u'ame\u0301lie'

When I do: 'amélie'.title() on that string, I get 'AméLie', which makes no sense to me.

I know I can do a workaround, but is this intended behavior or a bug? I would expect the "l" to NOT get capitalized.

another experiment:

  In [1]: [ord(c) for c in 'amélie'.title()]
  Out[1]: [65, 109, 101, 769, 76, 105, 101]

  In [2]: [ord(c) for c in 'amélie']
  Out[2]: [97, 109, 101, 769, 108, 105, 101]

932

asked Sep 02 '15 04:09

lqdc

1 Answers

Take a look at these questions: Python title() with apostrophes and Titlecasing a string with exceptions

Basically it looks like a limitation of the inbuilt title function which seems to be very liberal about what it considers a word boundary.

You can use string.capwords:

import string
string.capwords('amélie')
Out[18]: 'Amélie'

Another thing you could do is use the character é ('\xc3\xa9') which is an e with accent built in:

b'am\xc3\xa9lie'.decode().title()
Out[21]: 'Amélie'

143

answered Oct 11 '22 15:10

maxymoo

Related questions
                            
                                Pytest does not pick up test methods inside a class
                            
                                How can I sample a multivariate log-normal distribution in Python?
                            
                                Interactive slider to vary slice used in Bokeh image plot
                            
                                PyGtk - set checkbox in the treeview of a specific row invisible
                            
                                Python multiprocessing and an imported module
                            
                                converting a string to a tree structure in python
                            
                                How to add for each screen an own .py and .kv file?
                            
                                Firefox not receiving django csrf_token
                            
                                How to filter DeprecationWarnings that happen during importing?
                            
                                Which layout should I use to get non-overlapping edges in igraph in python?
                            
                                numpy array multiplication with arrays of arbitrary dimensions
                            
                                Sklearn joblib load function IO error from AWS S3
                            
                                Normalizing a list of restaurant dishes
                            
                                Is the char encoding same across programming languages?
                            
                                Check specific file has been modified using python watchdog
                            
                                Bokeh: pass vars to CustomJS for Widgets
                            
                                Generating random string of seedable data
                            
                                Pyspark module not found
                            
                                Python Requests encoding POST data
                            
                                Django REST Framework (DRF): TypeError: register() got an unexpected keyword argument 'base_name'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With