RFC 6265 Sec 6.1 specifies allowing atleast 4096 bytes per cookie.
Now in order to know the number of characters allowed per cookie, I need to know the character encoding being used for cookies, as the RFC specifies the maximum size per cookie in terms of bytes and not characters.
How do I know the encoding being used to store cookies?
Is it determined by the character encoding used by the programming language used to create cookies (e.g PHP, JavaScript) or the character encoding being used by the browser storing cookies?
I conducted a few tests, and it appears that FF, Chrome and Opera seem to be using UTF-8 for cookie storage, and the encoding obviously affects the number of characters you could store in a cookie. The maximum number of characters allowed in a cookie would be affected by the character encoding being used to store cookies on a client.
Suspecting the browsers are using UTF-8 as the character encoding for cookies, I used the tests here with a single-byte UTF-8 character (1
), two-byte UTF-8 character (£
), a 3-byte UTF-8 character (畀
), and a 4-byte UTF-8 character (𝆏
). I've pasted the results obtained below.
Every cookie set used a single-byte cookie name, and the number of characters mentioned does not include the single-byte character for the cookie name and the character =
used to separate cookie name and coookie value. The value in []
beside each Unicode character denotes its hex representation in UTF-8.
FF 31.0
Firefox relaxes the RFC limit by a byte and puts a limit of 4097 bytes per cookie.
1
, [0x31]) -- 4095 characters£
, [0xC2, 0xA3]) -- 2047 characters畀
, [0xE7, 0x95, 0x80]) -- 1365 characters𝆏
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 charactersChrome 36.0.1985.143
1
, [0x31]) -- 4094 characters£
, [0xC2, 0xA3]) -- 2047 characters畀
, [0xE7, 0x95, 0x80]) -- 1364 characters𝆏
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 charactersOpera 24.0.1558.17
1
, [0x31]) -- 4094 characters£
, [0xC2, 0xA3]) -- 2047 characters畀
, [0xE7, 0x95, 0x80]) -- 1364 characters𝆏
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 charactersIE 8.0.6001.19518
IE too relaxes the RFC limit to 5117 bytes per cookie, but also enforces a maximum cookies' size per domain limit (in this case, the limit found was 10234 characters)
1
, [0x31]) -- 5115 characters£
, [0xC2, 0xA3]) -- 5115 characters畀
, [0xE7, 0x95, 0x80]) -- 5115 characters𝆏
, [0xF0, 0x9D, 0x86, 0x8F]) -- 2557 charactersIE seems to be using the ECMAScript's notion of characters. ECMAScript exposes characters as 16-bit unsigned integers (character encoding could be either UTF-16 or UCS-2 and is left as an implementation choice). The 4-byte character chosen for the tests uses two 16-bit code units in UTF-16. And since ECMAScript counts a 16-bit integer as a characer, "𝆏".length === 2
returns true
.
This leads 𝆏
to be counted as two characters.
It seems it is determined more by the programmer (behind the browser) than by the programming language. Usually cookies values are URL-encoded but there is no requirement.
Have a look at this answer that complete your study (adding the Safari special case). This one might help too.
No matter how the cookies are stored internally by the browser, they eventually have to be transferred within the Set-Cookie
and Cookies
HTTP Header fields. It is the encoded length of those fields that the authors of the RFC most probably have in mind. At least in most RFCs that would be the case, so why not assume it here. Consequently, "the size of a cookie" depends on the way it will be encoded within an HTTP header.
According to the standard, request header fields should be
the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string
where *TEXT, in turn:
MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047.
RFC2047 defines what is known as "MIME encoding" and, as I read it, has some funny rules. Namely, according to its rules in order to encode a foreign charset you will either have to use a "quoted-printable" format: =?UTF-8?Q?=48=65=6c=6c=6f?=
, or a "Base64 format: =?UTF-8?B?SGVsbG8=?=
. (Note that both examples here encode the word "Hello". The first uses 27 bytes, the second uses 20, however this does not include the cookie name and attributes).
Moreover, according to RFC2047 you may not have "encoded words" longer than 76 characters, hence, if I understand things correctly, your longer cookie values will have to be encoded as a bunch of 76-byte pieces, each piece starting with the =?UTF-8?Q?=
mumbo-jumbo.
I tested what would happen if I set a non-ASCII (Russian language) cookie using PHP via Apache. The resulting Set-Cookie
header had no charset specification, used URL-encoding and was longer than 76 bytes (so much for the standards, right?):
CookieName=%D0%92+%D0...%B0%D0%B9; expires=Thu, 11-Sep-2014 19:59:18 GMT; path=/tmp/; domain=.some.domain.
The total length of a cookie value (with attributes), corresponding to an otherwise 176-character sentence was 923 bytes.
To summarize, I don't think you can get a strict answer to your question, but it's a fun question none the less.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With