Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the string formatting % in python works with unicodes?

I have a question about unicodes and the string formatting % in python. I have the following four cases:

  1. case:

    # -*- encoding: utf -*-
    print '%s' % 'München'
    
  2. case:

    # -*- encoding: utf -*-
    print '%s' % u'München'
    
  3. case:

    # -*- encoding: utf -*-
    print u'%s' % u'München'
    
  4. case:

    # -*- encoding: utf -*-
    print u'%s' % 'München'
    

Cases 1-3 work fine but in case 4 I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

So my questions are: why do the cases 1-3 work (especially case 2) and why does case 4 fail?

I know how to fix my problem but I want to understand why this problem happens, so I would be happy if someone could help me. Thanks!

PS: Thanks for the links to possible duplicates but sadly my problems aren't solved by Why does Python 2.x throw an exception with string formatting + unicode? because in this they don't use a unicode for the to-be-formated-string. So they do cases 1 and 2 but not 4, and especially case 2 does work for me and breaks for them...


1 Answers

In cases 2 and 4, the non-Unicode string is being coerced to Unicode implicitly using the default ascii codec. In case 2 '%s' can be converted to Unicode with that codec, but in case 4 'München' cannot.

In cases 1 and 3, both are byte strings or both are Unicode strings so no coercion is required.

like image 104
Mark Tolonen Avatar answered May 27 '26 17:05

Mark Tolonen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!