Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find whether a particular string has unicode characters (esp. Double Byte characters)

To be more precise, I need to know whether (and if possible, how) I can find whether a given string has double byte characters or not. Basically, I need to open a pop-up to display a given text which can contain double byte characters, like Chinese or Japanese. In this case, we need to adjust the window size than it would be for English or ASCII. Anyone has a clue?

like image 888
Jay Avatar asked Sep 29 '08 07:09

Jay


People also ask

How do I check if a string contains Unicode characters?

The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit. It returns a boolean value, either true or false. The charAt() method returns a character value at a given index. It belongs to the String class in Java.

How do I find Unicode characters?

To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.

Is UTF 8 a double byte?

There is no strong concept of "double byte" characters in UTF-8. UTF-8 encodes each Unicode codepoint in one to four code units. There is nothing special about two vs three.

What is a Unicode character string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.


2 Answers

I used mikesamuel answer on this one. However I noticed perhaps because of this form that there should only be one escape slash before the u, e.g. \u and not \\u to make this work correctly.

function containsNonLatinCodepoints(s) {     return /[^\u0000-\u00ff]/.test(s); } 

Works for me :)

like image 81
james Avatar answered Sep 23 '22 07:09

james


JavaScript holds text internally as UCS-2, which can encode a fairly extensive subset of Unicode.

But that's not really germane to your question. One solution might be to loop through the string and examine the character codes at each position:

function isDoubleByte(str) {     for (var i = 0, n = str.length; i < n; i++) {         if (str.charCodeAt( i ) > 255) { return true; }     }     return false; } 

This might not be as fast as you would like.

like image 40
pcorcoran Avatar answered Sep 23 '22 07:09

pcorcoran