Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Facebook charset detection mechanism?

Today, I have looked into HTML code of facebook.com, and found something like this:

<input type="hidden" value="€,´,€,´,水,Д,Є" name="charset_test"/>

It's repeated two times inside the <form>...</form>.

Any idea what this code might be useful for - some kind of server-side client charset detection? As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

like image 220
Void Avatar asked Jan 06 '10 12:01

Void


2 Answers

Any idea what this code might be useful for - some kind of server-side client charset detection?

Apparently so.

The Euro sign is useful for charset detection because there are so many ways of encoding it:

  • E2 82 AC in UTF-8
  • 88 in windows-1251
  • 80 in the other windows-125x encodings
  • A4 in ISO-8859-7, -15, and -16
  • A2 E3 in GB18030
  • 85 40 in Shift-JIS
  • etc.

As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

It's supposed to transmitted in the HTTP Content-Type header, but that doesn't mean that user agents actually get it right.

like image 75
dan04 Avatar answered Oct 22 '22 17:10

dan04


I guess they are matching this in the receiving script to make sure the client sent the request properly encoded as UTF-8 and maybe even, because they know what characters to expect, to detect the actual encoding on the fly.

If I remember correctly - I had to deal with it once - there have been problems with form encoding in IE6 in some situations.

like image 3
Pekka Avatar answered Oct 22 '22 17:10

Pekka