Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numbers localization in Web applications

People also ask

How do I localize numbers?

When localizing numbers, it's important to have an idea of the different numeral systems that exist in the world. A numeral system is a written representation of numbers. The Western Arabic numeral system – which has the digit symbols 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 – is very common in Western locales.

What is localization in web application?

Website localization is the process of adapting an existing website to local language and culture in the target market. It is the process of adapting a website into a different linguistic and cultural context— involving much more than the simple translation of text.

Do numbers need to be localized?

The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything. The only important thing to take into account is large number representation.


Here is an approach with code shifting:

// Eastern Arabic (officially "Arabic-Indic digits")
"0123456789".replace(/\d/g, function(v) {
    return String.fromCharCode(v.charCodeAt(0) + 0x0630);
});  // "٠١٢٣٤٥٦٧٨٩"

// Persian variant (officially "Eastern Arabic-Indic digits (Persian and Urdu)")
"0123456789".replace(/\d/g, function(v) {
    return String.fromCharCode(v.charCodeAt(0) + 0x06C0);
});  // "۰۱۲۳۴۵۶۷۸۹"

DEMO: http://jsfiddle.net/bKEbR/

Here we use Unicode shift, since numerals in any Unicode group are placed in the same order as in latin group (i.e. [0x0030 ... 0x0039]). So, for example, for Arabic-Indic group shift is 0x0630.

Note, it is difficult for me to distinguish Eastern characters, so if I've made a mistake (there are many different groups of Eastern characters in Unicode), you could always calculate the shift using any online Unicode table. You may use either official Unicode Character Code Charts, or Unicode Online Chartable.


One has to decide if this is a question of appearance or of transformation. One must also decide if this is a question involving character-level semantics or numeral representations. Here are my thoughts:


The question would have entirely different semantics, if we had a situation where Unicode had not sparated out the codes for numeric characters. Then, displaying the different glyphs as appropriate would simply be a matter of using the appropriate font. On the other hand, it would not have been possible to simply write out the different characters as I did below without changing fonts. (The situation is not exactly perfect as fonts do not necessarily cover the whole range of the 16-bit Unicode set, let alone the 32-bit Unicode set.)

9, ٩ (Arabic), ۹ (Urdu), 玖 (Chinese, complex), ๙ (Thai), ௯ (Tamil) etc.  

Now, assuming we accept Unicode semantics i.e. that '9' ,'٩', and '۹' are distinct characters, we may conclude that the question is not about appearance (something that would have been in the purview of CSS), but of transformation -- a few thoughts about this later, for now let us assume this is the case. When focusing on character-level semantics, the situation is not too dissimilar with what happens with alphabets and letters. For instance, Greek 'α' and Latin 'a' are considered distinct, even though the Latin alphabet is nearly identical to the Greek alphabet used in Euboea. Perhaps even more dramatically, the corresponding capital variants, 'Α' (Greek) and 'A' (Latin) are visually identical in practically all fonts supporting both scripts, yet distinct as far as Unicode is concerned.

Having stated the ground rules, let us see how the question can be answered by ignoring them, and in particular ignoring (character-level) Unicode semantics.

(Horrible, nasty and non-backwards compatible) Solution: Use fonts that map '0' to '9' to the desired glyphs. I am not aware of any such fonts. You would have to use @font-face and some font that has been appropriately hacked to do what you want.

Needless to say, I am not particularly fond of this solution. However, it is the only simple solution I am aware of that does what the question asks "without changing character codes" on either the server or the client side. (Technically speaking the Cufon solution I propose below does not change the character codes either, but what it does, drawing text into canvases is vastly more complex and also requires tweaking open-source code).


Note: Any transformational solution i.e. any solution that changes the DOM and replaces characters in the range '0' to '9' to, say, their Arabic equivalents will break code that expects numerals to appear in their original form in the DOM. This problem is, of course, worst when discussing forms and inputs.

An example of an answer taking the transformational approach is would be:

  $("[lang='fa']").find("*").andSelf().contents().each(function() {
      if (this.nodeType === 3) 
     {
        this.nodeValue = this.nodeValue.replace(/\d/g, function(v) {
            return String.fromCharCode(v.charCodeAt(0) + 0x0630);
       });
    }
 });

Note: Code taken from VisioN's second jsFiddle. If this is the only part of this answer that you like, make sure you upvote VisioN's answer, not mine!!! :-)

This has two problems:

  1. It messes with the DOM and as a result may break code that used to work assuming it would find numerals in the "standard" form (using digits '0' to '9'). See the problem here: http://jsfiddle.net/bKEbR/10/ For instance, if you had a field containing the sum of some integers the user inputs, you might be in for a surprise when you try to get its value...
  2. It does not address the issue of what goes on inside input (and textarea) elements. If an input field is initialised with, say, "42", it will retail that value. This can be fixed easily, but then there is the issue of actual input... One may decide to change characters as they come, convert the values when they changes and so on and so forth. If such conversion is made then both the client side and the server side will need to be prepared to deal with different kinds of numeral. What comes out of the box in Javascript, jQuery and even Globalize (client-side), and ASP.NET, PHP etc. (server-side) will break if fed with numerals in non-standard formats ...

A slightly more comprehensive solution (taking care also of input/textarea elements, both their initial values and user input) might be:

//before the DOM change, test1 holds a numeral parseInt can understand
alert("Before: test holds the value:" +parseInt($("#test1").text()));

function convertNumChar(c) {
   return String.fromCharCode(c.charCodeAt(0) + 0x0630);
}

function convertNumStr(s) {
    return s.replace(/\d/g, convertNumChar);
}

//the change in the DOM
$("[lang='fa']").find("*").andSelf().contents()
    .each(function() {
        if (this.nodeType === 3)        
           this.nodeValue = convertNumStr(this.nodeValue);      
    })
    .filter("input:text,textarea")
    .each(function() {
         this.value = convertNumStr(this.value)
     })
     .change(function () {this.value = convertNumStr(this.value)});      

//test1 now holds a numeral parseInt cannot understand
alert("After: test holds the value:" +parseInt($("#test1").text()))

The entire jsFiddle can be found here: http://jsfiddle.net/bKEbR/13/

Needless to say, this only solves the aforementioned problems partially. Client-side and/or server-side code will have to recognise the non-standard numerals and convert them appropriately either to the standard format or to their actual values.

This is not a simple matter that a few lines of javascript will solve. And this is but the simplest case of such possible conversion since there is a simple character-to-character mapping that needs to be applied to go from one form of numeral to the other.


Another go at an appearance-based approach:

Cufon-based Solution (Overkill, Non-Backwards Compatible (requires canvas), etc.): One could relatively easily tweak a library like Cufon to do what is envisaged. Cufon can do its thing and draw glyphs on a canvas object, except that the tweak will ensure that when elements have a certain property, the desired glyphs will be used instead of the ones normally chosen. Cufon and other libraries of the kind tend to add elements to the DOM and alter the appearance of existing elements but not touch their text, so the problems with the transformational approaches should not apply. In fact it is interesting to note that while (tweaked) Cufon provides a clearly transformational apprroach as far as the overall DOM is concerned, it is an appearance-based solution as far as its mentality goes; I would call it a hybrid solution.

Alternative Hybrid-Solution: Create new DOM elements with the arabic content, hide the old elements but leave their ids and content intact. Synchronize the arabic content elements with their corresponding, hidden, elements.


Let's try to think outside the box (the box being current web standards).

The fact that certain characters are unique does not mean they are unrelated. Moreover, it does not necessarily mean that their difference is one of appearance. For instance, 'a' and 'A' are the same letter; in some contexts they are considered to be the same and in others to be different. Having, the distinction in Unicode (and ASCII and ISO-Latin-1 etc. before it) means that some effort is required to overcome it. CSS offers a quick and easy way for changing the case of letters. For instance, body {text-transform:uppercase} would turn all letters in the text in the body of the page into upper case. Note that this is also a case of appearance-change rather than transformation: the DOM of the body element does not change, just the way it is rendered.

Note: If CSS supported something like numerals-transform: 'ar' that would probably have been the ideal answer to the question as it was phrased.

However, before we rush to tell the CSS committee to add this feature, we may want to consider what that would mean. Here, we are tackling a tiny little problem, but they have to deal with the big picture.

Output: Would this numerals-transform feature work allow '10' (2-characters) to appear as 十(Chinese, simple), 拾 (Chinese, complex), X (Latin) (all 1-character) and so on if instead of 'ar', the appropriate arguments were given?

Input: Would this numerals-transform feature change '十'(Chinese, simple) into its Arabic equivalent, or would it simply target '10'? Would it somehow cleverly detect that "MMXI" (Latin numeral for 2012) is a number and not a word and convert it accordingly?

The question of number representation is not as simple as one might imagine just looking at this question.


So, where does all this leave us:

  1. There is no simple presentation-based solution. If one appears in the future, it will not be backwards compatible.
  2. There can be a transformational "solution" here and now, but even if this is made to work also with form elements as I have done (http://jsfiddle.net/bKEbR/13/) there need to be server-side and client-side awareness of the non-standard format used.
  3. There may be complex hybrid solutions. They are complex but offer some of the advantages of the presentation-based approaches in some cases.

A CSS solution would be nice, but actually the problem is big and complex when one looks at the big picture which involves other numeric systems (with less trivial conversions from and to the standard system), decimal points,signs etc.

At the end of the day, the solution I see as realistic and backwards compatible would be an extension of Globalize (and server-side equivalents) possibly with some additional code to take care of user input. The idea is that this is not a problem at the character-level (because once you consider the big picture it is not) and that it will have to be treated in the same way that differences with thousands and decimal separators have been dealt with: as formatting/parsing issues.


I imagine the best way is to use a regexp to search what numeric characters should be changed via adding a class name to the div that needs a different numeric set.

You can do this using jQuery fairly easy.

jsfiddle DEMO


EDIT: And if you don't want to use a variable, then see this revised demo:

jsfiddle DEMO 2


I have been working on a general web page localization technique that does more than just numbers (its similar to .po files)

The localization files are simple (the strings can contain html if needed)

/* Localization file - save as document_url.lang.js ... index.html.en.js: */
items=[
{"id":"string1","value":"Localized text of string1 here."},
{"id":"string2", "value":"۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ "}
];
rtl=false; /* set to true for rtl languages */

This format is useful to separate out for translators (or mechanical turk)

and a basic page template

<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<head><title>My title</title>
<style>.txt{float:left;margin-left:10px}</style>
</head>
<body onload='setLang()'>
<div id="string1" class="txt">This is the default text of string1.</div>
<div id="string2" class="txt">0 1 2 3 4 5 6 7 8 9 </div>
</body></html>
<script>
   function setLang(){
      for(var i=0;i<items.length;i++){
         term=document.getElementById(items[i].id)
         if(term)term.innerHTML=items[i].value
         if(rtl){  /* for rtl languages */ 
            term.style.styleFloat="right"
            term.style.cssFloat="right"
            term.style.textAlign="right"
         }
      }
   }
   var lang=navigator.userLanguage || navigator.language;
   var script=document.createElement("script");
   script.src=document.URL+"-"+lang.substring(0,2)+".js"
   var head = document.getElementsByTagName('head')[0]
   head.insertBefore(script,head.firstChild)
</script>

I tried to keep it pretty simple, yet cover as many locales as possible so additional css is likely required (I have to admit a lack of exposure to rtl languages, so many more styles may need to be set)

I do have font checking code that would be useful if you know what fonts support your character codes well

function hasFont(f){
    var s=document.createElement("span")
    s.style.fontSize="72px"
    s.innerHTML="MWMWM"
    s.style.visibility="hidden"
    s.style.fontFamily=[(f=="monospace")?'':'monospace','sans-serif','serif']
    document.body.appendChild(s)
    var w=s.offsetWidth
    s.style.fontFamily=[f,'monospace','sans-serif','serif']
    document.body.lastChild=s
    return s.offsetWidth!=w
}

usage: if(hasFont("myfont"))myelement.style.fontFamily="myfont"


A new (to date) and simple JS solution would be to use Intl.NumberFormat. It supports numeral localization, formatting variations as well as local currencies (see documentation for more examples).

To use an example very similar to MDN's own:

const val = 1234567809;
console.log('Eastern Arabic (Arabic-Egyptian)', new Intl.NumberFormat('ar-EG').format(val));
console.log('Persian variant (Farsi)',new Intl.NumberFormat('fa').format(val));
console.log('English (US)',new Intl.NumberFormat('en-US').format(val));

Intl.NumberFormat also seems to support string numeric values as well as indicates when it's not a number in the local language.

const val1 = '456';
const val2 = 'Numeric + string example, 123';
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val1));
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val2));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val1));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val2));
console.log('English',new Intl.NumberFormat('en-US').format(val1));
console.log('English', new Intl.NumberFormat('en-US').format(val2));

For the locale identifier (string passed to NumberFormat constructor indicating locale), I experimented with the values above and they seemed fine. I tried finding a list for all possible values, and through MDN came across this documentation and this list that could be helpful.

I'm not familiar with SEO, and am thus unsure how this answers that part of the question.