How do browsers determine the encoding used?

1 Answers

They can guess it based on heuristic

I don't know how good are browsers today at encoding detection but MS Word did a very good job at it and recognizes even charsets I've never heard before. You can just open a *.txt file with random encoding and see.

This algorithm usually involves statistical analysis of byte patterns, like frequency distribution of trigraphs of various languages encoded in each code page that will be detected; such statistical analysis can also be used to perform language detection.

https://en.wikipedia.org/wiki/Charset_detection

Firefox uses the Mozilla Charset Detectors. The way it works is explained here and you can also change its heuristic preferences. The Mozilla Charset Detectors were even forked to uchardet which works better and detects more languages

[Update: As commented below, it moved to chardetng since Firefox 73]

Chrome previously used ICU detector but switched to CED almost 2 years ago

None of the detection algorithms are perfect, they can guess it incorrectly like this, because it's just guessing anyway!

This process is not foolproof because it depends on statistical data.

so that's how the famous Bush hid the facts bug occurred. Bad guessing also introduces a vulnerability to the system

For all those skeptics out there, there is a very good reason why the character encoding should be explicitly stated. When the browser isn't told what the character encoding of a text is, it has to guess: and sometimes the guess is wrong. Hackers can manipulate this guess in order to slip XSS past filters and then fool the browser into executing it as active code. A great example of this is the Google UTF-7 exploit.

http://htmlpurifier.org/docs/enduser-utf8.html#fixcharset-none

As a result, the encoding should always be explicitly stated.

130

answered Sep 28 '22 23:09

phuclv

Related questions
                            
                                HTML,CSS checkbox label position should be on top of checkbox
                            
                                Text colour fill from left to right using CSS
                            
                                Rendering custom html tag with react.js
                            
                                Angular2: set md-fab button 'position: fixed' in inner component
                            
                                How to read a text file saved on my computer using javascript
                            
                                Delay in hiding a button using Ng-Hide in Angular JS
                            
                                How to open local html file on Safari?
                            
                                How to create progress bar for Owl Carousel 2?
                            
                                Adding scroll bars to <aside>, <main> and other flex layout elements [duplicate]
                            
                                CSS animation to pulse section of image
                            
                                Auto Tab to the next input field when 1 character is filled with an input field disabled
                            
                                how to change a select options dynamically based on other select options in JQuery?
                            
                                Make span height 100% of outer div height
                            
                                Is <a href="example.com/page.php#page" rel="noindex, nofollow"> correct?
                            
                                How can i put image tag into bootstrap grid?
                            
                                JQuery On change, keyup not firing
                            
                                Download large size files with angular file saver
                            
                                How to allow the user to pick any file or directory in an <input type="file"> tag?
                            
                                Flexbox layout equal widths (paragraph text not wrapping)
                            
                                How to play a sound continuously without break?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do browsers determine the encoding used?

Tags:

html

encoding

Vivek Kumar

People also ask

1 Answers

They can guess it based on heuristic

phuclv

Recent Activity

Donate For Us