I have a Japanese project that needs to validate a half width and full width Japanese character, 14 chars are allowed on half width and 7 characters on full width.
Is there anyone who knows how to implement that?
Right now on my model
class Customer
validates_length_of :name, :maximum => 14
end
is not a good choice
I'm currently using ror 2.3.5 Both fullwidth and halfwidth can be used
First of all, the concept of fullwidth (全角) and halfwidth (半角) exists only for two types of characters in Japanese:
A similar concept exists for Korean Hangul, but not for Japanese Hiragana, nor for Kanji.
For Katakana, half-width characters have their own Unicode code points, and they are rendered half the size of full-width characters, although they are identical in shape otherwise. Example:
Fullwidth "ka": カ
Halfwidth "ka": カ
Combined characters (i.e. with diacritics like ガ) do not exists in halfwidth versions; they must be encoded as two separate characters: カ + ゙, which is probably the reason why in your task twice as many characters are allowed for halfwidth. (Note that these combinations of two code points are regarded as combining characters and usually rendered as one.)
For Roman (Latin) characters, the usual ASCII characters are called halfwidth, but the Japanese code range of Unicode (as well as traditional Japan-specific character sets) provide a separate code range for fullwidth versions. Example:
Fullwidth: L
Halfwidth: L
Fullwidth versions do not exist for non-ASCII Latin-derived characters (such as German umlauts), nor for accented versions. They do, however, exist for numerals and some punctuation characters.
Again, Hiragana and Kanji have no halfwidth versions.
To check whether a character is a fullwidth or halfwidth character, compare the code point to the relevant code range. The ranges are as follows:
Halfwidth Katakana: 0xff61
through 0xff9f
Fullwidth Katakana: 0x30a0
through 0x30ff
Halfwidth Roman: 0x21
through 0x7e
(this is ASCII)
Fullwidth Roman: 0xff01
through 0xff60
Hiragana: 0x3041
through 0x309f
Kanji (i.e. the unified-ideographs range): 0x4e00
through 0x9fcc
Here is a simple Ruby program that performs the checks on a per-character basis:
# -*- coding: utf-8 -*-
def is_halfwidth_katakana(c)
return (c.ord >= 0xff61 and c.ord <= 0xff9f)
end
def is_fullwidth_katakana(c)
return (c.ord >= 0x30a0 and c.ord <= 0x30ff)
end
def is_halfwidth_roman(c)
return (c.ord >= 0x21 and c.ord <= 0x7e)
end
def is_fullwidth_roman(c)
return (c.ord >= 0xff01 and c.ord <= 0xff60)
end
def is_hiragana(c)
return (c.ord >= 0x3041 and c.ord <= 0x309f)
end
def is_kanji(c)
return (c.ord >= 0x4e00 and c.ord <= 0x9fcc)
end
text = "Hello World、こんにちは、半角カタカナ、全角カタカナ、fullwidth 0-9\n"
text.split("").each do |c|
if is_halfwidth_katakana(c)
type = "halfwidth katakana"
elsif is_fullwidth_katakana(c)
type = "fullwidth katakana"
elsif is_halfwidth_roman(c)
type = "halfwidth roman"
elsif is_fullwidth_roman(c)
type = "fullwidth roman"
elsif is_hiragana(c)
type = "hiragana"
elsif is_kanji(c)
type = "kanji"
end
printf("%c (%x) %s\n",c,c.ord,type)
end
Further notes
The code ranges above are the official Unicode ranges for each character type (see Unicode Fullwidth forms and Unicode Hiragana). These include certain fullwidth / halfwidth versions of characters that are old / traditional forms or special punctuation characters. If you only want characters that are commonly used in web forms (e.g. for people to enter their names), you might want to narrow the ranges a bit.
Recommendation: If this is for a web form where people can enter their names, you might want to do a little more than just check for half-width or full-width. It is extremely common on Japanese websites and registration forms, esp. with banks, to require that people enter their name in pure halfwidth (typically for Latin) or pure fullwidth (typically for Katakana). Unfortunately, this makes entering data very inconvenient. When the Japanese input method is enabled, Latin characters often come out in fullwidth versions, and the web form will then reject the data because it isn't pure halfwidth. Rather than rejecting it, it should automatically convert it to whatever form it needs. You can easily implement this by translating from one code range to the other (simply by adding the relevant constant), and make people's lives much easier.
The following code may just push you over the line to fulfil the exact requirement you've so far specified in the least possible time. It uses the Moji gem (Japanese documentation), which gives lots of convenience methods in determining the content of a Japanese language string.
It validates a maximum of 14 characters in a name
that only consists of half-width characters, and a maximum of 7 characters for name
s otherwise (including names that contain a combination of half- and full-width characters i.e. the presence of even one full-width character in the string will make the whole string be regarded as "full-width").
class Customer
validates_length_of :name, :maximum => 14,
:if => Proc.new { |customer| half_width?(customer.name) }
validates_length_of :name, :maximum => 7
:unless => Proc.new { |customer| half_width?(customer.name) }
def half_width?(string)
Moji.type?(string, Moji::HAN_KATA)
end
end
Assumptions made:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With