I am using django_countries
module for countries list, the problem is there are couple of countries with special characters like 'Åland Islands'
and 'Saint Barthélemy'
.
I am calling this method to get the country name:
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
I know that country_label is lazy translated proxy object of django utils, but it is not giving the right name rather it gives 'Ã…land Islands'
. any suggestions for this please?
Django stores unicode
string using code points and identifies the string as unicode for further processing.
UTF-8 uses four 8-bit bytes encoding, so the unicode
string that's being used by Django needs to be decoded or interpreted from code point notation to its UTF-8 notation at some point.
In the case of Åland Islands, what seems to be happening is that it's taking the UTF-8 byte encoding and interpret it as code points to convert the string.
The string django_countries returns is most likely u'\xc5land Islands'
where \xc5
is the UTF code point notation of Å. In UTF-8 byte notation \xc5
becomes \xc3\x85
where each number \xc3
and \x85
is a 8-bit byte. See:
http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex
Or you can use country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') to go from u'\xc5land Islands'
to '\xc3\x85land Islands'
If you take then each byte and use them as code points, you'll see it'll give you these characters: Ã…
See: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex
And: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex
See code snippet with html notation of these characters.
<div id="test">Ã…Å</div>
So I'm guessing you have 2 different encodings in you application. One way to get from u'\xc5land Islands'
to u'\xc3\x85land Islands'
would be to in an utf-8 environment encode to UTF-8 which would convert u'\xc5'
to '\xc3\x85'
and then decode to unicode
from iso-8859
which would give u'\xc3\x85land Islands'
. But since it's not in the code you're providing, I'm guessing it's happening somewhere between the moment you set country_label
and the moment your output isn't displayed properly. Either automatically because of encodings settings, or through an explicit assignation somewhere.
FIRST EDIT:
To set encoding for you app, add # -*- coding: utf-8 -*-
at the top of your py file and <meta charset="UTF-8">
in of your template.
And to get unicode string from a django.utils.functional.proxy object you can call unicode()
. Like this:
country_label = unicode(fields.Country(form.cleaned_data.get('country')[0:2]).name)
SECOND EDIT:
One other way to figure out where the problem is would be to use force_bytes
(https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) Like this:
from django.utils.encoding import force_bytes
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=False, errors='strict')
But since you already tried many conversions without success, maybe the problem is more complex. Can you share your version of django_countries
, Python
and your django app language settings?
What you can do also is go see directly in your djano_countries
package (that should be in your python directory), find the file data.py and open it to see what it looks like. Maybe the data itself is corrupted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With