What is the 'lang' attribute of the <html> tag used for?

Tags:

html

In HTML, it's good to have a lang attribute in <html>, e.g. <html lang="en">.

How is this useful?

If this is used for translation, even if the language is set to English and there are all Chinese text in the document Google Translate detects it as Chinese, not English (this means Google ignores the lang attribute).

608

asked Feb 01 '13 15:02

Santosh Kumar

3 Answers

I am quoting this from W3C:

Declaring language in HTML

Always use a language attribute on the html tag to declare the default language of the text in the page. When the page contains content in another language, add a language attribute to an element surrounding that content.

Use the lang attribute for pages served as HTML, and the xml:lang attribute for pages served as XML. For XHTML 1.x and HTML5 polyglot documents, use both together.

Use language tags from the IANA Language Subtag Registry.

Also a good read is Why use the language attribute?.

112

answered Oct 17 '22 15:10

NullPoiиteя

You asked "how is this useful".

"The <lang=> attribute can be used to declare the language of a Web page or a portion of a Web page. This is meant to assist search engine spiders, page formatting and screen reader technology"

Source: http://symbolcodes.tlt.psu.edu/web/tips/langtag.html (Wayback Machine link)

No mention of translation - but often a search engine spider will not want to parse through a document "in the wrong language" - its index file will grow (lots of new words), and the results will not be useful to the user (who cannot read the language, and who is using the wrong search terms).

The advent of smart translation technology (like Google's, referred to above) means that some search engines can see a page in one language, translate it, and figure out that someone searching for "cow" may be interested in this page that mentions "vache" and has <lang="fr">.

answered Oct 17 '22 16:10

Floris

The lang attribute is needed by screen readers to let them pronounce words correctly, and also (perhaps surprisingly) sometimes needed to allow text to be rendered correctly by the browser.

`lang` needed for speech synthesis

Some blind or visually impaired people use speech-synthesizing screen readers that speak the words on the screen. Since two words from different languages that are spelt identically may be pronounced differently, such speech synthesis cannot be done without knowing the language of the text. For instance, the word "pain" in English is pronounced completely differently to the word "pain" in French, so a screen reader that doesn't know whether it's reading English or French won't know how to pronounce "pain".

Using the lang attribute indicates to a screen reader what language some text is in and thus allows it to pronounce the word correctly.

I recorded a demonstration of this using Narrator, the built-in screen reader for Windows. (If you'd like to reproduce this, do note that you'll need to have both the English and French voice packages installed via the Speech settings page in the Windows Settings app, and have English as your default voice.) The demo uses a HTML page with the following content:

<h5>No lang specified:</h5>
<p>J'aime le pain</p>

<h5>French:</h5>
<p lang="fr">J'aime le pain</p>

As you can hear in the recording I uploaded at https://www.youtube.com/watch?v=7J1I65sn1CQ, Microsoft George (the default English voice) butchers the pronunciation of the French phrase (pronouncing it "Jay aim le payne"), whereas Microsoft Hortense (the default French voice) pronounces it correctly.

`lang` needed for text rendering

Perhaps surprisingly, the benefits of the lang attribute are not limited to disabled people using speech-synthesizing assistive tech. Setting lang can also affect text rendering, since the correct way to render some text can be language-dependent.

There are a couple of different mechanisms by which the lang you set can affect how text gets rendered:

different fonts being selected based on the lang attribute, either:
- based on the browser's default font selection rules, or
- because you've explicitly set up language-specific fonts using :lang selectors in your CSS
or
fonts having language-specific rules included in them, such as language-specific alternative glyphs or language-specific rules about which sequences of characters to substitute with a ligature

Below I will present a couple of interesting examples I could discover of such language-specific rendering happening.

Language-dependent forms of Han characters

There exist many Han (Chinese) characters that have been adopted in other east-Asian languages, such as Japanese (where such characters are called "Kanji"). The proper way to draw these characters sometimes differs between Chinese and the other languages that have assimilated them, yet, due to Unicode's Han unification, there only exists a single Unicode code point to represent the character, rather than a distinct code point for each language-specific variant of it. Several examples are listed in the Examples of language-dependent glyphs section of the Wikipedia article linked above.

When rendering such a character, in order to know which glyph to display (for instance, whether to display the Japanese Kanji or the Chinese hanzi), the browser needs to know the language of the text in which the character appears.

To try to see your browser considering text's language in this way, save the following HTML to a file and open it in your browser:

Chinese: <span lang="zh">飴</span>
<br>
Japanese: <span lang="ja">飴</span>

Note that the same character, 飴, is used in both spans. But they display differently in the browser, at least in Chrome on my Windows PC:

Screenshot demonstrating the point above

As you can see, the Kanji rendered in the span marked as Japanese is different in several ways from the hanzi rendered in the span marked as Chinese. By inspecting each span in the Chrome dev tools and looking at the "Rendered Fonts" section, I can see that this is because Chrome has used different fonts for the two spans - namely Microsoft YaHei for the Chinese span and Yu Gothic for the Japanese one.

`fi` ligatures getting disabled for Turkish text

As described at https://en.wikipedia.org/wiki/Ligature_(writing)#Stylistic_ligatures, a stylistic ligature is used in many fonts that merges together the letters fi into a single combined glyph, where the top-right corner of the f merges with the dot above the i. In most languages, like English, this looks pretty and doesn't make the text any less readable.

Image showing the combined "fi" glyph

However, such a ligature is problematic in Turkish or other languages where the dotted and dotless I both exist and are distinct characters, because it makes it impossible to tell whether it represents fi (an f followed by a dotted i) or fı (an f followed by a dotless ı).

For that reason, fonts that include a substitution of fi with such a ligature will hopefully have that substitution only occur in languages for which it's appropriate. As I understand it, in OpenType, such rules are implemented by making "features" in the font specific to particular "language systems" via the Language System Table.

To see this in action, I downloaded a font with such a fi ligature - specifically Okta Neue - and created the following demo page:

<style>
    @font-face {
        font-family: oktaneue;
        src: url("Groteskly Yours - Okta Neue UltraLight.otf");
    }
    * {
        font-family: oktaneue;
    }
</style>
<span lang="en">Lütfiye</span>
<br>
<span lang="tr">Lütfiye</span>

Note that this time - unlike in the earlier example with hanzi and Kanji - both spans are using the same font. But, because the font itself contains language-specific features, the spans nonetheless render differently:

Screenshot of the aforementioned example page

As you can see, the fi ligature gets used for the span labelled as English, but not for the one labelled as Turkish - which is what we wanted!

answered Oct 17 '22 15:10

Mark Amery

Related questions
                            
                                WebM vs. Ogg Theora [closed]
                            
                                CSS selector for empty or whitespace
                            
                                Font looks blurry after translate in Chrome
                            
                                Pixelated edge around a CSS Circle with overflow: hidden;
                            
                                Is the default font-size of every browser 16px? Why?
                            
                                Async and document ready
                            
                                Disable pinch zoom in webkit (or electron)
                            
                                <input> accept Attribute in Microsoft Edge
                            
                                Text input on Chrome: line-height seems to have a minimum
                            
                                Attach a blob to an input of type file in a form
                            
                                Flexbox/IE11: flex-wrap: wrap does not wrap (Images + Codepen inside)
                            
                                Semantic HTML of an articles list
                            
                                How do you handle translation of text with markup?
                            
                                What is HTML5 File.slice method actually doing?
                            
                                Possible to view PHP code of a website?
                            
                                How to provide ECMAScript 5 (ES 5)-shim?
                            
                                How to inject content of CSS file into HTML in Gulp? [closed]
                            
                                Jekyll default installation doesn't have _layouts directory
                            
                                Angular2: conditional display, bind to [hidden] property vs. *ngIf directive [duplicate]
                            
                                Color (syntax highlighting) within an HTML <code> tag

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the 'lang' attribute of the <html> tag used for?

Tags:

html

Santosh Kumar

People also ask

3 Answers

NullPoiиteя

Floris

`lang` needed for speech synthesis

`lang` needed for text rendering

Language-dependent forms of Han characters

`fi` ligatures getting disabled for Turkish text

Mark Amery

Recent Activity

Donate For Us

What is the 'lang' attribute of the <html> tag used for?

Tags:

html

Santosh Kumar

People also ask

3 Answers

NullPoiиteя

Floris

lang needed for speech synthesis

lang needed for text rendering

Language-dependent forms of Han characters

fi ligatures getting disabled for Turkish text

Mark Amery

Related questions

Recent Activity

Donate For Us

`lang` needed for speech synthesis

`lang` needed for text rendering

`fi` ligatures getting disabled for Turkish text