If no charset parameter is specified in the Content-Type header, RFC2616 section 3.7.1 seems to imply ISO8859-1 should be assumed for media types of subtype "text": <blockquote> When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. </blockquote> However, I routinely see applications that serve up Javascript files with Content-Type values like "application/x-javascript" (i.e. no charset param), even when these scripts contain non-ASCII UTF-8 characters, which would be corrupt if interpreted as ISO8859-1. This does not seem to pose problems to clients. How do clients know to interpret the bytes as UTF-8? Is there a rule for other character-data subtypes that implies UTF-8 should be the default? Where is this documented?

All major browsers I've checked (IE, FF and Opera) completely ignore the RFC specification in this part. If you are interested in the algorithm to auto-detect charset by data, look at Mozilla Firefox link. Just a small note about content types: Only text has character sets. It's reasonable to assume that browsers handle application/x-javascript the same as they handle text/javascript ( except IE6, but that's another subject ). Internet Explorer will use the default charset (probably stored at registry), as noted: <blockquote> By default, Internet Explorer uses the character set specified in the HTTP content type returned by the server to determine this translation. If this parameter is not given, Internet Explorer uses the character set specified by the meta element in the document. It uses the user's preferences if no meta element is specified. </blockquote> Source: http://msdn.microsoft.com/en-us/library/ms537500%28VS.85%29.aspx Mozilla Firefox attempts to auto-detect the charset, as pointed here: <blockquote> This paper presents three types of auto-detection methods to determine encodings of documents without explicit charset declaration. </blockquote> Source: http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html Opera uses auto-detection too, as documented: <blockquote> If the transport protocol provides an encoding name, that is used. If not, Opera will look at the page for a charset declaration. If this is missing, Opera will attempt to auto-detect the encoding, using the domain name to see if the script is a CJK script, and if so which one. Opera can also auto-detect UTF-8. </blockquote> Source: http://www.opera.com/docs/specs/opera9/

For HTTP responses with Content-Types suggesting character data, which charset should be assumed by the client if none is specified?

Tags:

content-type

http

character-encoding

default

rfc2616

If no charset parameter is specified in the Content-Type header, RFC2616 section 3.7.1 seems to imply ISO8859-1 should be assumed for media types of subtype "text":

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value.

However, I routinely see applications that serve up Javascript files with Content-Type values like "application/x-javascript" (i.e. no charset param), even when these scripts contain non-ASCII UTF-8 characters, which would be corrupt if interpreted as ISO8859-1.

This does not seem to pose problems to clients. How do clients know to interpret the bytes as UTF-8? Is there a rule for other character-data subtypes that implies UTF-8 should be the default? Where is this documented?

492

asked Feb 24 '10 11:02

rewbs

1 Answers

All major browsers I've checked (IE, FF and Opera) completely ignore the RFC specification in this part.

If you are interested in the algorithm to auto-detect charset by data, look at Mozilla Firefox link.

Just a small note about content types: Only text has character sets. It's reasonable to assume that browsers handle application/x-javascript the same as they handle text/javascript ( except IE6, but that's another subject ).

Internet Explorer will use the default charset (probably stored at registry), as noted:

By default, Internet Explorer uses the character set specified in the HTTP content type returned by the server to determine this translation. If this parameter is not given, Internet Explorer uses the character set specified by the meta element in the document. It uses the user's preferences if no meta element is specified.

Source: http://msdn.microsoft.com/en-us/library/ms537500%28VS.85%29.aspx

Mozilla Firefox attempts to auto-detect the charset, as pointed here:

This paper presents three types of auto-detection methods to determine encodings of documents without explicit charset declaration.

Source: http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

Opera uses auto-detection too, as documented:

If the transport protocol provides an encoding name, that is used. If not, Opera will look at the page for a charset declaration. If this is missing, Opera will attempt to auto-detect the encoding, using the domain name to see if the script is a CJK script, and if so which one. Opera can also auto-detect UTF-8.

Source: http://www.opera.com/docs/specs/opera9/

answered Sep 20 '22 01:09

Sagi

Related questions
                            
                                IE doesn't follow redirect, gives "Internet Explorer cannot display the webpage"
                            
                                Node HTTP request for Restful api's that return JSONP
                            
                                if connection is keep alive how to read until end of stream php
                            
                                python parse http response (string)
                            
                                HTTP could not register URL
                            
                                Angular2 @ TypeScript Observable error
                            
                                Create + Serve (over HTTP) a .ZIP file, without writing to disk?
                            
                                Http status code for Exceptions
                            
                                Progress notifications from HTTP/REST service
                            
                                Reconstructing data from PCAP sniff
                            
                                Recommended way to check file size on upload
                            
                                When could the HTTP Host header be undefined?
                            
                                Which HTTP method to use for file downloading?
                            
                                AngularJS - Unknown provider configuring $httpProvider
                            
                                Pros and Cons of Clojure http client libraries
                            
                                One line HTTPS server
                            
                                ionic app cannot connect cors enabled server with $http
                            
                                Why does IIS return empty responses?
                            
                                Why do browser implementations of HTTP/2 require TLS?
                            
                                Maximum Cookie Size of current browsers (Year 2018)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With