Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript: how to check if character is RTL?

How can I programmatically check if the browser treats some character as RTL in JavaScript?

Maybe creating some transparent DIV and looking at where text is placed?

A bit of context. Unicode 5.2 added Avestan alphabet support. So, if the browser has Unicode 5.2 support, it treats characters like U+10B00 as RTL (currently only Firefox does). Otherwise, it treats these characters as LTR, because this is the default.

How do I programmatically check this? I'm writing an Avestan input script and I want to override the bidi direction if the browser is too dumb. But, if browser does support Unicode, bidi settings shouldn't be overriden (since this will allow mixing Avestan and Cyrillic).

I currently do this:

var ua = navigator.userAgent.toLowerCase();

if (ua.match('webkit') || ua.match('presto') || ua.match('trident')) {
    var input = document.getElementById('orig');
    if (input) {
        input.style.direction = 'rtl';
        input.style.unicodeBidi = 'bidi-override';
    }
}

But, obviously, this would render script less usable after Chrome and Opera start supporting Unicode 5.2.

like image 503
Kryzhovnik Avatar asked Aug 17 '12 12:08

Kryzhovnik


People also ask

How do you check if a char is a special character JavaScript?

To check if a string contains special characters, call the test() method on a regular expression that matches any special character. The test method will return true if the string contains at least 1 special character and false otherwise.

How do you check if a character in a string is a letter JavaScript?

To check if a character is a letter, call the test() method on the following regular expression - /^[a-zA-Z]+$/ . If the character is a letter, the test method will return true , otherwise false will be returned. Copied!

How do you check if a character is a number in JS?

isInteger() method returns true if a value is an integer of the datatype Number. Otherwise it returns false .


4 Answers

function isRTL(s){                var ltrChars    = 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF',         rtlChars    = '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC',         rtlDirCheck = new RegExp('^[^'+ltrChars+']*['+rtlChars+']');      return rtlDirCheck.test(s); }; 

playground page

like image 150
vsync Avatar answered Sep 20 '22 18:09

vsync


I realize this is quite a while after the original question was asked and answered but I found vsync's update to be rather useful and just wanted to add some observations. I would add this in comment to his answer but my reputation is not high enough yet.

Instead of a regular expression that searches from the start of the line zero or more non-LTR characters and then one RTL character, wouldn't it make more sense to search from the start of the line zero or more weak/neutral characters and then one RTL character? Otherwise you have the potential for matching many RTL characters unnecessarily. I would welcome a more thorough examination of my weak/neutral character group as I merely used the negation of the combined LTR and RTL character groups.

Additionally, shouldn't characters such as LTR/RTL marks, embeds, overrides be included in the appropriate character groupings?

I would think then that the final code should look something like:

function isRTL(s){                var weakChars       = '\u0000-\u0040\u005B-\u0060\u007B-\u00BF\u00D7\u00F7\u02B9-\u02FF\u2000-\u2BFF\u2010-\u2029\u202C\u202F-\u2BFF',         rtlChars        = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',         rtlDirCheck     = new RegExp('^['+weakChars+']*['+rtlChars+']');      return rtlDirCheck.test(s); }; 

Update

There may be some ways to speed up the above regular expression. Using a negated character class with a lazy quantifier seems to help improve speed (tested on http://regexhero.net/tester/?id=6dab761c-2517-4d20-9652-6d801623eeec, site requires Silverlight 5)

Additionally, if the directionality of the string is unknown, my guess is that for most cases the string will be LTR instead of RTL and creating an isLTR function would return results faster if that is the case but as OP is asking for isRTL, will provide isRTL function:

function isRTL(s){                var rtlChars        = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',         rtlDirCheck     = new RegExp('^[^'+rtlChars+']*?['+rtlChars+']');      return rtlDirCheck.test(s); }; 
like image 27
mcarthurart Avatar answered Sep 18 '22 18:09

mcarthurart


Testing for both Hebrew and Arabic (the only modern RTL languages/character sets I know which flow right-to-left except for any Persian-related which I've not researched):

/[\u0590-\u06FF]/.test(textarea.value)

More research suggests something along the lines of:

/[\u0590-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC]/.test(textarea.value)
like image 41
jimmont Avatar answered Sep 17 '22 18:09

jimmont


First addressing the question in the heading:

There are no tools in JavaScript as such for accessing Unicode properties of characters. You would need to find a library or service for the purpose (I’m afraid that might be difficult, if you need something reliable) or to extract the relevant information from the Unicode character “database” (a collection of text files in specific formats) and to write your own code to use it.

Then the question in message body:

This seems even more desperate. But as this would probably be something for a limited number of users who are knowledgeable and know Avestan, maybe it would not be too bad to display a string of Avestan characters along with an image of them in proper directionality and ask the user click on a button if the order is wrong. And you could save this selection in a cookie, so that the user needs to do this only once (per browser; though it should be relatively short-lived cookie, as the browser may get updated).

like image 26
Jukka K. Korpela Avatar answered Sep 20 '22 18:09

Jukka K. Korpela