Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to use the lang attribute with phonetic pronunciations (if at all)?

Some languages have an accepted transliteration to Latin characters, such as Hindi, Russian or Japanese. For example, the Hindi for 'The man is eating' written in Devanagari script is 'आदमी खा रहा है।'. Transliterated, it would be 'Aadmi kha raha hai.' (or something similar; this approach is often used online, especially if people don't have access to a Hindi keyboard.)

In this case, we're using the Latin script but still writing Hindi, so it would be acceptable to mark up either variation using the lang attribute:

<span lang="hi">आदमी खा रहा है।</span> or <span lang="hi">Aadmi kha raha hai.</span>

My question then is about languages that are normally written in the Latin alphabet themselves, but might have phonetic guides for non-speakers/learners — either IPA or ad hoc pronunciation — is there any best practice in terms of giving it semantic meaning?

For example, in Irish if I were to say "The man is eating", I would say "Tá an fear ag ithe." I can mark this up as:

<span lang="ga">Tá an fear ag ithe.</span>

If I were to give a pronunciation guide for non-speakers, I might say "Taw on far eg ih-he". The sentence isn't meaningless, (like 'lorem ipsum' text) but neither is the sentence in either English or Irish.

What is the correct use of language related attributes in HTML in this case, or is this use case just not covered currently by the specification?

like image 772
anotherdave Avatar asked Jul 19 '12 11:07

anotherdave


People also ask

What is the correct form of specifying lang attribute value?

The lang attribute takes an ISO language code as its value. Typically this is a two letter code such as “en” for English, but it can also be an extended code such as “en-gb” for British English.

Where do I put the lang attribute?

Always add a lang attribute to the html tag to set the default language of your page. If this is XHTML 1. x or an HTML5 polyglot document served as XML, you should also use the xml:lang attribute (with the same value). If your page is only served as XML, just use the xml:lang attribute.

How can you use the lang attribute if you have a webpage that uses more than one language?

When the page contains content in another language, add a language attribute to an element surrounding that content. This allows you to style or process it differently. For example: <p>The title is "<span lang="fr">Le Bon Usage</span>".

What is the purpose of the lang attribute in the code?

The lang global attribute helps define the language of an element: the language that non-editable elements are written in, or the language that the editable elements should be written in by the user.


1 Answers

Short version: if you want to specifically say it's written in the Latin alphabet, go for "hi-Latn" or "ga-Latn" for the examples you gave.

Long version:

The W3C spec for the lang attribute doesn't specifically mention this - it suggests some uses of this that depend on orthography (such as using it in order to render high-quality versions of the characters used), but some that don't (such as for search engines).

RFC1766, which specifies the format for the language tags, suggests that specialisations of tags may be used to represent "script variations, such as az-arabic and az-cyrillic". There's more about the script subtag in this article on the W3C site, and a bit extra in the later RFC5646. That one points to an ISO standard list of script names, and in that list the script you'd want is "Latn" as they're romanised forms of other scripts.

(This doesn't cover things like specifying how you did the transliteration, though, for languages which may have more than one standard e.g. Chinese in Latin script using Wade-Giles versus pinyin.)

like image 98
bouteillebleu Avatar answered Oct 26 '22 05:10

bouteillebleu