Why does this print a U
and not a Ü
?
#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':utf8';
use charnames qw(:full);
my $string = "\N{LATIN CAPITAL LETTER U}\N{COMBINING DIAERESIS}";
while ( $string =~ /(\X)/g ) {
say $1;
}
# Output: U
We explain what graphemes are and how you can help your child understand the concept at home. What is a grapheme? A grapheme is a written symbol that represents a sound ( phoneme ). This can be a single letter, or could be a sequence of letters, such as ai, sh, igh, tch etc.
So when a child says the sound /t/ this is a phoneme, but when they write the letter 't' this is a grapheme. These are all the phonemes in the English language (and some of the graphemes used to represent them):
Examples and Observations 1 Trevor A. Harley. ... 2 Linda C. Ehrie. ... 3 David Crystal 4 Graphemes. In the English alphabet, the switch from cat to bat introduces a meaning change; therefore, c and b represent different graphemes. 5 Florian Coulmas. ... 6 Cauline B. ...
Adjective: graphemic. The grapheme has been described as the "smallest contrastive linguistic unit which may bring about a change of meaning" (A.C. Gimson, An Introduction to the Pronunciation of English). Matching a grapheme to a phoneme (and vice versa) is called a grapheme-phoneme correspondence.
Your code is correct.
You really do need to play these things by the numbers; don’t trust what a "terminal" displays. Pipe it through the uniquote program, probably with -x
or -v
, and see what it is really doing.
Eyes deceive, and programs are even worse. Your terminal program is buggy, so is lying to you. Normalization shouldn’t matter.
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say "crème brûlée"'
crème brûlée
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say "crème brûlée"' | uniquote -x
cr\x{E8}me br\x{FB}l\x{E9}e
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say NFD "crème brûlée"'
crème brûlée
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say NFD "crème brûlée"' | uniquote -x
cre\x{300}me bru\x{302}le\x{301}e
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say NFC scalar reverse NFD "crème brûlée"'
éel̂urb em̀erc
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say NFC scalar reverse NFD "crème brûlée")' | uniquote -x
\x{E9}el\x{302}urb em\x{300}erc
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say scalar reverse NFD "crème brûlée"'
éel̂urb em̀erc
$ perl -CS -Mutf8 -MUnicode::Normalize -E 'say scalar reverse NFD "crème brûlée"' | uniquote -x
e\x{301}el\x{302}urb em\x{300}erc
This works for me, though I have an older version of perl, 5.012
, on ubuntu. My only change to your script is: use 5.012;
$ perl so.pl
Ü
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With