I download an HTML page. The HTTP content-type header specifies one character encoding, and the page has a <code>meta</code> tag that specifies another. What's the correct way to handle that? I guess 'correct' isn't the right word, since nobody follows the damn standards anyway... so what's the way that will cause me the least problems?

Do the same as webbrowsers do: use the response header. When HTML is served over HTTP, the meta tag is ignored when the response header is present. Only when the HTML is read from local disk file system, the meta tag is been used. This is also explicitly specified by w3 HTML spec. <blockquote> To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest): <ol> <li>An HTTP "charset" parameter in a "Content-Type" field.</li> <li>A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".</li> <li>The charset attribute set on an element that designates an external resource.</li> </ol> </blockquote> Any existing decent HTML parser in whatever language you use should already take this into account. According your question history you're familiar with Java, I'd then suggest to grab Jsoup for this.

Detecting character encoding in HTML

1 Answers

Do the same as webbrowsers do: use the response header. When HTML is served over HTTP, the meta tag is ignored when the response header is present. Only when the HTML is read from local disk file system, the meta tag is been used. This is also explicitly specified by w3 HTML spec.

To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

An HTTP "charset" parameter in a "Content-Type" field.

A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".

The charset attribute set on an element that designates an external resource.

Any existing decent HTML parser in whatever language you use should already take this into account. According your question history you're familiar with Java, I'd then suggest to grab Jsoup for this.

answered Oct 12 '22 00:10

BalusC

Related questions
                            
                                W3C Validation and Vue's HTML binding syntax
                            
                                How to make the whole div clickable
                            
                                How to get correct rendering size when printing html elements
                            
                                Access Control Allow Origin issue in Angular 2
                            
                                css border with radius different colors
                            
                                How to learn creating a datepicker
                            
                                Angular Flex Layout: Responsive layout combining row and column
                            
                                How do I get typography theme defaults to apply to regular tags with Material-UI?
                            
                                Vue @click doesn't work on an anchor tag with href present
                            
                                is Google Chrome <input /> Auto fill background color changed in Version 72.0?
                            
                                How do you allow a user to manually resize a <div> element vertically?
                            
                                How to properly reset grid-template-columns?
                            
                                How to auto resize the textarea to fit the content?
                            
                                Best way to compress HTML, CSS & JS with mod_deflate and mod_gzip disabled
                            
                                Easiest way or Best tools to convert word text to clean (X)HTML [closed]
                            
                                Can I store custom attributes in HTML DOM like a database record?
                            
                                HTML template + JSON vs Server HTML
                            
                                How do I get the border-radius from an element using jQuery?
                            
                                How to create a menu tree using HTML
                            
                                Higher z-index appearing below lower z-index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detecting character encoding in HTML

Tags:

html

http

character-encoding

Mike Baranczak

People also ask

1 Answers

BalusC

Recent Activity

Donate For Us