Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate diacritized vowel table automatically?

I want to have the table of vowels with diacritics, but don't want to search symbol tables manually.

Is it possible to generate this table by crossing the list of vowels and the list of diacritics in some of the following languages: Java, PHP, Wolfram Mathematica, .NET languages and so on?

I need to have characters (unicode) as output.

Java Solution

I found that there are a special Unicode feature for this: http://en.wikipedia.org/wiki/Unicode_normalization

Java supports it since 1.6 http://docs.oracle.com/javase/6/docs/api/java/text/Normalizer.html

So, the sample code is:

public static void main(String[] args) {
    String vowels = "aeiou";
    char[] diacritics = {'\u0304', '\u0301', '\u0300', '\u030C'};
    StringBuilder sb = new StringBuilder();

    for(int v=0; v<vowels.length(); ++v) {
        for(int d=0; d<diacritics.length; ++d) {
            sb.append(vowels.charAt(v));
            sb.append(diacritics[d]);

            sb.append(' ');
        }
        sb.append(vowels.charAt(v));
        sb.append('\n');
    }

    String ans = Normalizer.normalize(sb.toString(), Normalizer.Form.NFC);

    JOptionPane.showMessageDialog(null, ans);
}

I.e. we just put combining diacritics after vowels and then apply normalization to the string.

like image 488
Dims Avatar asked Jan 08 '12 11:01

Dims


People also ask

How do you type a diacritical mark?

For an accented a (á), for example, just type the apostrophe, then the a. For an è, type the backward apostrophe ` then the e: è. An i with a circumflex is created by typing the circumflex (shift 6: ^) then the i: î. This also works with the double quote (") for the umlaut (dieresis) and the tilde (ñ, ã).

What is a vowel Diacritic?

The diaeresis (/daɪˈɛrəsɪs, -ˈɪər-/ dy-ERR-ə-sis, -⁠EER-; is a diacritical mark used to indicate the separation of two distinct vowels in adjacent syllables when an instance of diaeresis (or hiatus) occurs, so as to distinguish from a digraph or diphthong.

Why are there no diacritics in English?

The Great Vowel Shift that occurred between 1350 and 1700 saw a great deal of phonetic changes occur, essentially leading to a condition where our spelling reflects a language that once didn't really need accent marks, but now probably really does.

What is a Diacritic in phonetics?

diacritic, a mark near or through an alphabetic character to represent a pronunciation different from that of the unmarked character.


1 Answers

To be honest, I haven't completely deciphered what Szabolcs' code is doing, but in this particular case this seems to produce the same result in Mathematica using slightly less code

data = Import["http://unicode.org/Public/UNIDATA/NamesList.txt", "Lines"];

codes = Cases[data, 
 b_String /; StringMatchQ[
  b, ___ ~~ "LATIN " ~~ "CAPITAL" | "SMALL" ~~ " LETTER " ~~ 
   "A" | "E" | "I" | "O" | "U" ~~ " WITH " ~~ ___] :> 
    FromDigits[StringTake[b, 4], 16], Infinity];

FromCharacterCode[codes]

which produces

"ÀÁÂÃÄÅÈÉÊËÌÍÎÏÒÓÔÕÖØÙÚÛÜàáâãäåèéêëìíîïòóôõöøùúûüĀāĂ㥹ĒēĔĕĖėĘęĚěĨĩĪīĬ\
ĭĮįİŌōŎŏŐőŨũŪūŬŭŮůŰűŲųƗƟƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǞǟǠǡǪǫǬǭǺǻǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍ\
ȎȏȔȕȖȗȦȧȨȩȪȫȬȭȮȯȰȱȺɆɇɨᶏᶒᶖᶙḀḁḔḕḖḗḘḙḚḛḜḝḬḭḮḯṌṍṎṏṐṑṒṓṲṳṴṵṶṷṸṹṺṻẚẠạẢảẤấẦầẨ\
ẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾếỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợỤụỦủỨứỪừỬửỮ\
ữỰựⱥⱸⱺꝊꝋꝌꝍ"
like image 168
Heike Avatar answered Nov 01 '22 09:11

Heike