I want to have the table of vowels with diacritics, but don't want to search symbol tables manually.
Is it possible to generate this table by crossing the list of vowels and the list of diacritics in some of the following languages: Java, PHP, Wolfram Mathematica, .NET languages and so on?
I need to have characters (unicode) as output.
Java Solution
I found that there are a special Unicode feature for this: http://en.wikipedia.org/wiki/Unicode_normalization
Java supports it since 1.6 http://docs.oracle.com/javase/6/docs/api/java/text/Normalizer.html
So, the sample code is:
public static void main(String[] args) {
String vowels = "aeiou";
char[] diacritics = {'\u0304', '\u0301', '\u0300', '\u030C'};
StringBuilder sb = new StringBuilder();
for(int v=0; v<vowels.length(); ++v) {
for(int d=0; d<diacritics.length; ++d) {
sb.append(vowels.charAt(v));
sb.append(diacritics[d]);
sb.append(' ');
}
sb.append(vowels.charAt(v));
sb.append('\n');
}
String ans = Normalizer.normalize(sb.toString(), Normalizer.Form.NFC);
JOptionPane.showMessageDialog(null, ans);
}
I.e. we just put combining diacritics after vowels and then apply normalization to the string.
For an accented a (á), for example, just type the apostrophe, then the a. For an è, type the backward apostrophe ` then the e: è. An i with a circumflex is created by typing the circumflex (shift 6: ^) then the i: î. This also works with the double quote (") for the umlaut (dieresis) and the tilde (ñ, ã).
The diaeresis (/daɪˈɛrəsɪs, -ˈɪər-/ dy-ERR-ə-sis, -EER-; is a diacritical mark used to indicate the separation of two distinct vowels in adjacent syllables when an instance of diaeresis (or hiatus) occurs, so as to distinguish from a digraph or diphthong.
The Great Vowel Shift that occurred between 1350 and 1700 saw a great deal of phonetic changes occur, essentially leading to a condition where our spelling reflects a language that once didn't really need accent marks, but now probably really does.
diacritic, a mark near or through an alphabetic character to represent a pronunciation different from that of the unmarked character.
To be honest, I haven't completely deciphered what Szabolcs' code is doing, but in this particular case this seems to produce the same result in Mathematica using slightly less code
data = Import["http://unicode.org/Public/UNIDATA/NamesList.txt", "Lines"];
codes = Cases[data,
b_String /; StringMatchQ[
b, ___ ~~ "LATIN " ~~ "CAPITAL" | "SMALL" ~~ " LETTER " ~~
"A" | "E" | "I" | "O" | "U" ~~ " WITH " ~~ ___] :>
FromDigits[StringTake[b, 4], 16], Infinity];
FromCharacterCode[codes]
which produces
"ÀÁÂÃÄÅÈÉÊËÌÍÎÏÒÓÔÕÖØÙÚÛÜàáâãäåèéêëìíîïòóôõöøùúûüĀāĂ㥹ĒēĔĕĖėĘęĚěĨĩĪīĬ\
ĭĮįİŌōŎŏŐőŨũŪūŬŭŮůŰűŲųƗƟƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǞǟǠǡǪǫǬǭǺǻǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍ\
ȎȏȔȕȖȗȦȧȨȩȪȫȬȭȮȯȰȱȺɆɇɨᶏᶒᶖᶙḀḁḔḕḖḗḘḙḚḛḜḝḬḭḮḯṌṍṎṏṐṑṒṓṲṳṴṵṶṷṸṹṺṻẚẠạẢảẤấẦầẨ\
ẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾếỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợỤụỦủỨứỪừỬửỮ\
ữỰựⱥⱸⱺꝊꝋꝌꝍ"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With