Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is lang=unknown attribute valid?

Tags:

html

lang

Given an HTML document in a specific language (english).

I have defined a lang attribute on the tag :

<html lang="en">

Some texts in the page are written in another language (for example french) :

<span lang="fr">
  blabla...
</span>

But, if I can not identify the language, but I know it is NOT english, can I set "unknown" as a valid value for lang attribute?

<span lang="unknown">
  blabla...
</span>

I read this in w3c documentation, but I am not sure if "the default value is [...] unknown" means that "unknown" is a real value...

http://www.w3.org/TR/html4/struct/dirlang.html

lang = language-code [CI] This attribute specifies the base language of an element's attribute values and text content. The default value of this attribute is unknown.

like image 757
Nicolas Payart Avatar asked Mar 14 '13 17:03

Nicolas Payart


1 Answers

The wording in the HTML 4.01 specification is obscure; the value unknown is not a valid language tag, and the spec uses the word “unknown” as a normal English word. That is, the default value is a value that indicates that the language is not known, but this value is not explicitly specified.

The spec is partly outdated in this area, as it refers to a superseded RFC on language tags. The current RFC is RFC 5646, Tags for Identifying Languages, also known as BCP (Best Current Practice) 47. It refers, among other things, to ISO 639-2 as regards to primary language tags, and they contain the code und for “undetermined”. So technically you could use lang=und, but the RFC says: “This subtag SHOULD NOT be used unless a language tag is required and language information is not available or cannot be determined. Omitting the language tag (where permitted) is preferred.”

And this is the approach adopted in HTML5 RC, which says about lang: “Setting the attribute to the empty string indicates that the primary language is unknown. [BCP47]”

Thus, for text in unidentifiable language you can use e.g. <span lang="">...</span>.

This is, in principle, useful when you have indicated the language at a higher level of element nesting. Setting lang="" may mean that user agents disable spelling checks and language-specific formatting, for example, though this is still rather theoretical.

like image 124
Jukka K. Korpela Avatar answered Nov 03 '22 01:11

Jukka K. Korpela