What's the correct encoding of HTTP get request strings?

Tags:

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs? If it doesn't define is there a way define which encoding is used? It seems that most browsers send the data in utf-8.

207

asked Oct 10 '09 22:10

JtR

2 Answers

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs?

The HTTP standard, no. But another standard, IRI, can come into play.

URIs are explicitly (once %-decoded) byte sequences. What Unicode characters those bytes map onto is not specified by the URI standard or the HTTP standard for http:-scheme URIs.

Specifically for query parameters: web browsers will use the encoding of the originating page to make a form submission GET URL, so if you have a page in ISO-8859-1 and you put ‘é’ in a search box you'll get ‘?search=%E9’, but if you do the same in a page encoded as UTF-8 you'll get ‘?search=%C3%E9’. If you don't serve your form page with any particular charset the browser will guess, which you don't want as it'll make it impossible to guess what format the submission is going to come in as.

For the other parts of a URL, a browser won't generate them itself, but if you supply it with non-ASCII characters in links it will usually encode them as UTF-8. This is not reliable as it depends on browser and locale settings, so it's best not to use this at the moment.

The standard that properly allows non-ASCII characters in links is IRI. IRI converts to URI by UTF-8-%-encoding most of the URL, but the hostname is converted using Punycode instead. For compatibility it is best not to rely on browsers understanding IRIs in links yet. Instead, UTF-8-then-%-encode your path and parameter characters yourself. They will still appear as the right characters in the address bar in modern browsers; unfortunately IE won't display the decoded-character IRI form in all cases, depending on language settings.

The Wiki IRI for the Greek gamma character is:

http://en.wikipedia.org/wiki/Γ

Encoded into a URI, it is:

http://en.wikipedia.org/wiki/%CE%93

answered Sep 17 '22 11:09

bobince

Per RFC 2616,

   CHAR           = <any US-ASCII character (octets 0 - 127)>

and

 token          = 1*<any CHAR except CTLs or separators>

   separators     = "(" | ")" | "<" | ">" | "@"
                  | "," | ";" | ":" | "\" | <">
                  | "/" | "[" | "]" | "?" | "="
                  | "{" | "}" | SP | HT

and URIs are tokens with various specific separators. So, in theory, nothing but US-ASCII should be there. (In practice, since the ISO-8859-1 extension to US-ASCII is used in many other spots in the HTTP specs, it's not unusual to find HTTP implementations which support ISO-8859-1 rather than just US-ASCII, but strictly speaking that's not standards-compliant HTTP).

answered Sep 19 '22 11:09

Alex Martelli

Related questions
                            
                                .NET Workflow Engine Suggestions [closed]
                            
                                Does fgets() always terminate the char buffer with \0?
                            
                                Short circuiting (&&) in Haskell
                            
                                How can I force inheriting classes to implement a static method in C#?
                            
                                mirror single page with httrack
                            
                                Traits in javascript [closed]
                            
                                Avoid specifying all arguments in a subclass
                            
                                Hibernate: Enabling lazy fetching in Criteria API
                            
                                Is there something like Snoop (WPF) or FireBug (ASP.NET) for Windows Forms? [closed]
                            
                                XSD.exe and "Circular Group references"
                            
                                C# Events between threads executed in their own thread (How to)?
                            
                                How to populate an array with recordset data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With