Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does it print funny characters? unicode problem?

The user entered the word

éclair

into the search box.

Showing results 1 - 10 of about 140 for �air. 

Why does it show the weird question mark? I'm using Django to display it:

Showing results 1 - 10 of about 140 for {{query|safe}}
like image 695
TIMEX Avatar asked Dec 10 '25 04:12

TIMEX


2 Answers

It's an encoding problem. Most likely your form or the output page is not UTF-8 encoded.

This article is very good reading on the issue: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

You need to check the encoding of

  • the HTML page where the user input the word
  • the HTML page you are using to output the word
  • the multi-byte ability of the functions you use to work with the string (though that probably isn't a problem in Python)

If the search is going to apply to a data base, you will need to check the encoding of the database connection, as well as the encoding of your tables and columns.

like image 164
Pekka Avatar answered Dec 12 '25 17:12

Pekka


This is the result when you interpret data that is not encoded in UTF-8 as UTF-8 encoded.

The interpreter expects from the code point of your first character of the word éclair a multibyte encoded character with a length of three characters, consumes the next two characters but can’t decode it (probably invalid byte sequence). For this case the REPLACEMENT CHARACTER � (U+FFFD) is shown.

So in your case you just need to really encode your data with UTF-8.

like image 35
Gumbo Avatar answered Dec 12 '25 18:12

Gumbo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!