According to RFC 2109, 2965 cookie's value can be either HTTP token or quoted string, and token can't include non-ASCII characters.
However I had found that Firefox browser (3.0.6) sends cookies with utf-8 string as-is and three web servers I tested (apache2, lighttpd, nginx) pass this string as-is to the application.
For example, raw request from browser:
$ nc -l -p 8080
GET /hello HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.9) Gecko/2009050519 Firefox/2.0.0.13 (Debian-3.0.6-1)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: windows-1255,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: wikipp=1234; wikipp_username=ארתיום
Cache-Control: max-age=0
And raw response of apache, nginx and lighttpd HTTP_COOKIE
CGI variable:
wikipp=1234; wikipp_username=ארתיום
What do I miss?
ABSTRACT This document specifies a way to create a stateful session with HTTP requests and responses. It describes two new headers, Cookie and Set-Cookie, which carry state information between participating origin servers and user agents.
Cookies are sent with every request, so they can worsen performance (especially for mobile data connections). Modern APIs for client storage are the Web Storage API ( localStorage and sessionStorage ) and IndexedDB.
A cookie is a piece of data from a website that is stored within a web browser that the website can retrieve at a later time. Cookies are used to tell the server that users have returned to a particular website.
RFC 2109 (Feb 1997) is obsolete and was superseded by RFC 2965 (Oct 2000), according to the Internet Official Protocol Standards (STD 1, RFC 5000).
You may also be interested in a more recent March 7, 2010 draft to revise 2965.
The only definition of a token in 2965 is:
informally, a sequence of non-special, non-white space characters
I wouldn't consider the entirety of UTF-8 to be disallowed by that definition - only characters that could be mistaken as control/syntax characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With