Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Japanese Detecting Full-Width characters

Tags:

javascript

I need to disallow writing full-width Japanese characters in input field, half-width are ok and any other symbols are ok except fullwidth ones.

Here https://gist.github.com/terrancesnyder/1345094 I found regex for full-width Katakana (zenkaku 全角) is it enough ? Currently my code looks like this

if ( /[ァ-ヶ]/.test("カナ") ) {
  console.log('full width');
}else{
  console.log('not full width');
}

I'm not familiar with Japanese so I don't know what else I have to check also, I mean they have katakana, hiragana and so on that 's why I'm not sure my script is good enough, please let me know what do you think

like image 902
dave101ua Avatar asked May 20 '16 15:05

dave101ua


2 Answers

Japanese use a lot of the kind of character.
e.g.

  • ひらがな(hiragana): あいうえお
  • カタカナ(katakana): アイウエオ
  • 半角カタカナ(half-width katakana): アイウエオ
  • 漢字(kanji): 安以宇衣於
  • 全角数字(full-width number): 12345
  • 全角アルファベット(full-width alphabet): ABCDE
  • 記号(symbol): ○△□〜☆≠≧

It can not be detected with a simple regular expression. Variations of the character width is also in other Asian languages. In Unicode, which is defined as the "east asian width".

Unicodedata module of python is often used to determine the "east asian width". That something like does not exist in the standard function of JavaScript.

But, there are some npm modules. If you use this East Asian Width module can be determined like this.

var eaw = require('eastasianwidth');
function isHalfWidth(c){ return eaw.length(c) == 1; }

isHalfWidth("あ")
// -> false
isHalfWidth("ア")
// -> true
isHalfWidth("A")
// -> false
isHalfWidth("A")
/// -> true
like image 168
mjy Avatar answered Oct 18 '22 02:10

mjy


mbStrWidth('過'); // return 2 ---> full width mbStrWidth('サ'); // return 1 ---> half width

// read more http://php.net/manual/en/function.mb-strwidth.php
function mbStrWidth(input) {
        let len = 0;
        for (let i = 0; i < input.length; i++) {
            let code = input.charCodeAt(i);
            if ((code >= 0x0020 && code <= 0x1FFF) || (code >= 0xFF61 && code <= 0xFF9F)) {
                len += 1;
            } else if ((code >= 0x2000 && code <= 0xFF60) || (code >= 0xFFA0)) {
                len += 2;
            } else {
                len += 0;
            }
        }
        return len;
}
like image 33
Vui Dang Avatar answered Oct 18 '22 03:10

Vui Dang