I would like to have an index for a list. For example Android contact list has #,A-Z. But for many locales this doesn't cover all the locale specific characters.
How is the support for this in different programming languages? I took a quick look and in case of core Java I didn't see anything m
Somewhat related: http://cldr.unicode.org/development/development-process/design-proposals/index-characters
Index characters are an ordered list characters for use as a UI "index", that is, a list of clickable characters (or character sequences) that allow the user to see a segment of a larger "target" list.
LC_COLLATE. Specifies a collation order and regular expression definition for the locale. LC_MESSAGES. Specifies the language in which the localized messages are written, and affirmative and negative responses of the locale (yes and no strings and expressions).
A locale consists of a number of categories for which country-dependent formatting or other specifications exist. A program's locale defines its code sets, date and time formatting conventions, monetary conventions, decimal formatting conventions, and collation (sort) order.
This locale refers to the ANSI C or POSIX-defined standard for the locale inherited by all processes at startup time. The C or POSIX locale assumes the 7-bit ASCII character set and defines information for the six previous categories. Parent topic: Understanding locale.
This is a VERY good question!
As you note in the language-agnostic tag, the important thing isn’t the programming language. It’s the data set that you really need here. I know of no repository for such things. The ᴄʟᴅʀ data do not yet contain this thing. Here’s a simple table of sequences for various two-letter ɪsᴏ codes, plus a few extras for Asian sequences, written in Perl. This sort of thing could be the basis of a module.
It does require somewhat careful handling, because you can’t blindly titlecase the first grapheme in each element without regard to locale if you want an "uppercase"-ish set. That’s because of the Turkic I problem. I would install methods that pull out the sequences, and detect such things if they asked for something in the Turkic languages.
use utf8;
use strict;
use warnings;
our %Alphabet = (
en => [qw(a b c d e f g h i j k l m n o p q r s t u v w x y z)],
br => [qw(a b ch c'h d e f g h i j k l m n o p r s t u v w y z)],
cy => [qw(a b c ch d dd e f ff g ng h i l ll m n o p ph r rh s t th u w y)],
ga => [qw(a á b c d e é f g h i í l m n o ó p r s t u ú)],
gd => [qw(a b c d e f g h i l m n o p r s t u)],
la => [qw(a b c d e f g h i k l m n o p q r s t v x y z)],
it => [qw(a b c d e f g h i k l m n o p q r s t u v z)],
es => [qw(a b c d e f g h i j k l m n ñ o p q r s t u v w x y z)],
es__traditional =>
[qw(a b c ch d e f g h i j k l ll m n ñ o p q r s t u v w x y z)],
eu => [qw(a b c ch d e f g h i j k l ll m n ñ o p q r s t ts tx tz u v w x y z)],
rm => [qw(a b c d e f g h i j l m n o p q r s t u v x z)],
ro => [qw(a ă â b c d e f g h i î j k l m n o p q r s ș t ț u v w x y z)],
oc => [qw(a b c d e f g h i j l m n o p q r s t u v x z)],
sw => [qw(a b c d e f g h i j k l m n o p q r s t u v w x y z å ä ö)],
no => [qw(a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å)],
is => [qw(a á b d ð e é f g h i í j k l m n o ó p r s t u ú v x y ý þ æ ö)],
cz => [qw(a á b c č d ď e é ě f g h ch i í j k l m n ň o ó p q r ř s š t ť u ú ů v w x y ý z ž)],
sk => [qw(a á ä b c č d ď dz dž e é f g h ch i í j k l ĺ ľ m n ň o ó ô p q r ŕ s š t ť u ú v w x y ý z ž)],
sl => [qw(a b c č d e f g h i j k l m n o p r s š t u v z ž)],
pl => [qw(a ą b c ć d e ę f g h i j k l ł m n ń o ó p r s ś t u w y z ź ż)],
lt => [qw(a ą b c č d e ę ė f g h i į y j k l m n o p r s š t u ų ū v z ž)],
lv => [qw(a ā b c č d e ē f g ģ h i ī j k ķ l ļ m n ņ o p r s š t u ū v z ž)],
et => [qw(a b d e f g h i j k l m n o p r s š z ž t u v õ ä ö ü)],
et__full =>
[qw(A B C D E F G H I J K L M N O P Q R S Š Z Ž T U V W Õ Ä Ö Ü X Y)],
et__simple => [qw(a b d e g h i j k l m n o p r s t u v õ ä ö ü)],
hu => [qw(a á b c cS d dz dzs e é f g gy H i í j k l ly M n ny O ó ö ő p q r s sz t ty u ú ü ű v w x y z zs)],
hu__traditional =>
[qw(a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p r s sz t ty u ú ü ű v z zs)],
tr => [qw(a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z)],
az => [qw([a b c ç d e ə f g ğ h x ı i j k q l m n o ö p r s ş t u ü v y z)],
az_1918_1939 =>
[qw(a в c ç d e ə f g ƣ h i ь j k q l m n o ɵ p r s ş t u v x y z ƶ)],
az_1939_1958 =>
[qw(а б в г ғ д е ё ә ж з и й к қ л м н о ө п р с т у ү ф х h ц ч ҷ ш щ ъ ы ь э ю я ')],
az_1958_1991 =>
[qw(а б в г ғ д e ә ж з и ы ј к ҝ л м н о ө п р с т у ү ф х һ ч ҹ ш ')],
az_1991_1992 =>
[qw(a ä b c ç d e f g ğ h x ı i j k q l m n o ö p r s ş t u ü v y z)],
he => [qw(α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω)],
ru => [qw(а б в г д е ж з и к л м н о п р с т у ф х ц ч ш щ ы э ю я)],
uk => [qw(а б в г ґ д е є ж з и і ї й к л м н о п р с т у ф х ц ч ш щ ь ю я)],
mk => [qw(а б в г д ѓ е ж з ѕ и ј / к л љ м н њ о п р с т ќ / у ф х ц ч џ ш)],
"HIRAGANA AIUEO" =>
[qw(あ い う え お か き く け こ さ し す せ そ た ち つ て と な に ぬ ね の は ひ ふ へ ほ ま み む め も や ゆ よ ら り る れ ろ わ を ん)],
"KATAKANA AIUEO" =>
[qw(ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソ タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ワ ヲ ン)],
"HALFWIDTH KATAKANA AIUEO" =>
[qw(ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソ タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ワ ヲ ン)],
"KATAKANA IROHA" =>
[qw(イ ロ ハ ニ ホ ヘ ト チ リ ヌ ル ヲ ワ カ ヨ タ レ ソ ツ ネ ナ ラ ム ウ ヰ ノ オ ク ヤ マ ケ フ コ エ テ ア サ キ ユ メ ミ シ ヱ ヒ モ セ ス)],
"HIRAGANA IROHA" =>
[qw(い ろ は に ほ へ と ち り ぬ る を わ か よ た れ そ つ ね な ら む う ゐ の お く や ま け ふ こ え て あ さ き ゆ め み し ゑ ひ も せ す)],
"HALFWIDTH KATAKANA IROHA" =>
[qw(イ ロ ハ ニ ホ ヘ ト チ リ ヌ ル ヲ ワ カ ヨ タ レ ソ ツ ネ ナ ラ ム ウ ノ オ ク ヤ マ ケ フ コ エ テ ア サ キ ユ メ ミ シ ヒ モ セ ス)],
"HANGUL CHOSUNG" =>
[qw(ㄱ ㄴ ㄷ ㄹ ㅁ ㅂ ㅅ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ)],
"HANGUL GANADA" =>
[qw(가 나 다 라 마 바 사 아 자 차 카 타 파 하)],
"CHINESE ZODIAC 10" =>
[qw(甲 乙 丙 丁 戊 己 庚 辛 壬 癸)],
"CHINESE ZODIAC 12" =>
[qw(子 丑 寅 卯 辰 巳 午 未 申 酉 戍 亥)],
"ZODIAC" => [qw(♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ )],
);
for my $a (\%Alphabet) {
$$a{da} = $$a{no};
$$a{fi} = $$a{no};
$$a{de} = $$a{en};
$$a{fr} = $$a{en};
$$a{pt} = $$a{en};
}
1;
That should certainly be enough to get you started, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With