Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Arabic characters behave as separate characters when styling single Arabic character?

Basically what I am trying to accomplish is Arabic characters misuse highlighter !

To make it easy for understand I will try to explain a similar functionality but for English.

Imagine a string with wrong capitalization, and it is required to rewrite it correctly, so the user rewrites the string in an input box and submits, the js checks to see if any char wasn't corrected then it displays the whole string with those letter corrected and highlighted in red;

i.e. [test ] becomes [Test ]

To do so, I was checking those chars, and if faulty char was detected it get surrounded with span to be colored in red.

So far so good, now when I try to replicate this for Arabic language the faulty char gets separated from the word making it unreadable.


Demo: jsfiddle

function check1() {
  englishanswer.innerHTML = englishWord.value.replace(/t/, '<span style="color:red">T</span>');
}

function check2() {
  arabicanswer.innerHTML =
    arabicWord.value.replace(/\u0647/, '<span style="color:red">' +
      unescape("%u0629") + '</span>') +
    '<br>' + arabicWord.value.replace(/\u0647/, unescape('%u0629'));
}
fieldset {
  border: 2px groove threedface;
  border-image: initial;
  width: 75%;
}
input {
  padding: 5px;
  margin: 5px;
  font-size: 1.25em;
}
p {
  padding: 5px;
  font-size: 2em;
}
<fieldset>
  <legend>English:</legend>
  <input id='englishWord' value='test' />
  <input type='submit' value='Check' onclick='check1()' />
  <p id='englishanswer'></p>
</fieldset>

<fieldset style="direction:rtl">
  <legend>عربي</legend>
  <input id='arabicWord' value='بطله' />
  <input type='submit' value='Check' onclick='check2()' />
  <p id='arabicanswer'></p>
</fieldset>

Notice when testing the Arabic word, the spanned char [first preview] is separated from the rest of the word, while the non-spanned char [second preview] appears normally.


Edit: Preview for the problem [Chrome UA]

enter image description here

like image 824
Mohammed Ibrahim Avatar asked Oct 14 '12 21:10

Mohammed Ibrahim


2 Answers

This is a longstanding bug in WebKit browsers (Chrome, Safari): HTML markup breaks joining behavior. Explicit use of ZWJ (zero-width joiner) used to help (see question Partially colored Arabic word in HTML), but it seems that the bug has become worse.

As a clumsy (but probably the only) workaround, you could use contextual forms for Arabic letters. This can be tested first using just static HTML markup and CSS, e.g.

بطﻠ<span style="color:red">ﺔ</span>

Here I am using, inside the span element, ﺔ U+FE94 ARABIC LETTER TEH MARBUTA FINAL FORM instead of the normal U+0629 ARABIC LETTER TEH MARBUTA and ﻠ U+FEE0 ARABIC LETTER LAM MEDIAL FORM instead of U+0644 ARABIC LETTER LAM.

To implement this in JavaScript, you would need, when inserting markup into a word Arabic letters, change characters before and after the break (caused by markup) to initial, medial, or final representation form according to its position in the word.

like image 55
Jukka K. Korpela Avatar answered Sep 29 '22 08:09

Jukka K. Korpela


i know that this solution i'm giving you is not very elegant but it kinda works so tell me what you think:

<script>
    function check1(){
    englishanswer.innerHTML = englishWord.value.replace(/t/,'<span style="color:red">T</span>');
}
function check2(){
arabicanswer.innerHTML = 
    arabicWord.value.replace(/\u0647/,'<span style="color:red">'+
    unescape("%u0640%u0629")+'</span>')+
    '<br>'+arabicWord.value.replace(/\u0647/,unescape('%u0629'));
}
</script>

<fieldset>
<legend>English:</legend>
<input id='englishWord' value='test'/>
<input type='submit' value='Check' onclick='check1()'/>
<p id='englishanswer'></p>
</fieldset>

<fieldset style="direction:rtl">
<legend>عربي</legend>
<input id='arabicWord' value='بطلـه'/>
<input type='submit' value='Check' onclick='check2()'/>
<p id='arabicanswer'></p>
</fieldset>
like image 30
Rachid O Avatar answered Sep 29 '22 06:09

Rachid O