In the Unicode standard, there are diacritical marks, such as U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂), and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ). I know that combining characters are combined with the previous letter to, say, make a letter like "ô", but what are modifier letters used for? Is it just a printable representation of the combining character, and if so, how is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?
[I'm not interested int the circumflex itself, but rather this class of characters (there seem to be many of them, as you can see here).]
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
CPT Modifiers are always two characters, and may be numeric or alphanumeric.
Depending from the application or browser there are two ways to use the Unicode Combining Diacritical Marks: With ā (a macron) as example, you may try to type in the 'a' first followed by the decimal code ̄ or ALT+ (it must be the + from the numeric keypad) followed by the hexadecimal code 0304 (i.e U+0304).
Inserting Unicode characters To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.
Modifier letters don't combine. They are semantically used as a modifier, unlike the plain equivalents like U+005E.
https://www.unicode.org/versions/Unicode11.0.0/ch07.pdf#G15832
7.8 Modifier Letters
Modifier letters, in the sense used in the Unicode Standard, are letters or symbols that are typically written adjacent to other letters and which modify their usage in some way. They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right. The sense in which they modify other letters is more a matter of their semantics in usage; they often tend to function as if they were diacritics, indicating a change in pronunciation of a letter, or otherwise distinguishing a letter’s use. Typically this diacritic modification applies to the character preceding the modifier letter, but modifier letters may sometimes modify a following character. Occasionally a modifier letter may simply stand alone representing its own sound.
Example of five U+0302 vs. U+02C6 vs. U+005E: ô̂̂̂̂ oˆˆˆˆˆo^^^^^
What is the difference between “combining characters” and “modifier letters”?
Combining characters
Combining characters are always applied against a preceding base character. Here is an example taken from section 5.13 Rendering Nonspacing Marks of The Unicode Standard
Version 11.0 – Core Specification where a sequence of four combining characters are applied to the base character a
:
Here's another example. Running this trivial Java code...
System.out.println("Base character: \u0930");
System.out.println("Base with combining characters: \u0930\u0903\u0951");
....yielded this output:
In this case the output was wider than the base character; one of the combining characters was placed above the base character, and the other was placed to the right of the base character.
I've provided both examples as screen shots because it can be difficult to find a font to render the resulting glyphs correctly.
Modifying Letters
In contrast to combining characters, modifying letters are freestanding. While they also usually modify another character (normally but not necessarily the preceding character) they are base characters themselves, and visually distinct. To use your example, here is the output of from a Java application printing the base character a
followed by U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂) and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ) respectively:
A 0302: Â
A 02C6: Aˆ
The MODIFIER LETTER CIRCUMFLEX ACCENT is rendered to the right of the A
whereas the COMBINING CIRCUMFLEX ACCENT is rendered above it.
The actual meaning (semantics) of the circumflex character as a modifying letter is context driven. For example, in French, the circumflex on the o
in côté
affects its pronunciation, but the circumflex on the u
in sûr
does not; instead it is used to visually distinguish sûr
(meaning sure) from the identically pronounced sur (meaning on). In French a circumflex on o
always affects pronunciation, and on u
it never does.
Is it just a printable representation of the combining character...
No - the modifying letter carries meaning. In the case of the French circumflex that meaning may be context driven based on the letter it modifies, as described above. But the meaning can be contained within the modifying letter itself. For example:
Modifier letters are commonly used in technical phonetic transcriptional systems, where they augment the use of combining marks to make phonetic distinctions. Some of them have been adapted into regular language orthographies as well. For example, U+02BB MODIFIER LETTER TURNED COMMA is used to represent the 'okina (glottal stop) in the orthography for Hawaiian.
That example also shows that a modifying letter need not be associated with any other character. That is never the case with combining characters.
Also note that a modifier letter is not necessarily a letter in any alphabet, and the majority of modifier letters are actually symbols (e.g. the circumflex).
How is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?
That is simply the character used to represent a circumflex accent. Unlike combining characters and modifier letters, it cannot be semantically or visually associated with any other character.
See the following sections in The Unicode® Standard Version 11.0 – Core Specification for lots more detail:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With