Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between "combining characters" and "modifier letters"?

In the Unicode standard, there are diacritical marks, such as U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂), and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ). I know that combining characters are combined with the previous letter to, say, make a letter like "ô", but what are modifier letters used for? Is it just a printable representation of the combining character, and if so, how is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?

[I'm not interested int the circumflex itself, but rather this class of characters (there seem to be many of them, as you can see here).]

like image 689
Greg Avatar asked Jan 30 '19 22:01

Greg


People also ask

What is a combination of character?

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).

How many characters are there in modifier?

CPT Modifiers are always two characters, and may be numeric or alphanumeric.

How do you combine Unicode characters?

Depending from the application or browser there are two ways to use the Unicode Combining Diacritical Marks: With ā (a macron) as example, you may try to type in the 'a' first followed by the decimal code ̄ or ALT+ (it must be the + from the numeric keypad) followed by the hexadecimal code 0304 (i.e U+0304).

How do you use Unicode diacritics?

Inserting Unicode characters To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.


2 Answers

Modifier letters don't combine. They are semantically used as a modifier, unlike the plain equivalents like U+005E.

https://www.unicode.org/versions/Unicode11.0.0/ch07.pdf#G15832

7.8 Modifier Letters

Modifier letters, in the sense used in the Unicode Standard, are letters or symbols that are typically written adjacent to other letters and which modify their usage in some way. They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right. The sense in which they modify other letters is more a matter of their semantics in usage; they often tend to function as if they were diacritics, indicating a change in pronunciation of a letter, or otherwise distinguishing a letter’s use. Typically this diacritic modification applies to the character preceding the modifier letter, but modifier letters may sometimes modify a following character. Occasionally a modifier letter may simply stand alone representing its own sound.


Example of five U+0302 vs. U+02C6 vs. U+005E: ô̂̂̂̂ oˆˆˆˆˆo^^^^^

like image 94
Mark Tolonen Avatar answered Sep 27 '22 15:09

Mark Tolonen


What is the difference between “combining characters” and “modifier letters”?

Combining characters

Combining characters are always applied against a preceding base character. Here is an example taken from section 5.13 Rendering Nonspacing Marks of The Unicode Standard Version 11.0 – Core Specification where a sequence of four combining characters are applied to the base character a:

combine1

Here's another example. Running this trivial Java code...

System.out.println("Base character:                 \u0930");
System.out.println("Base with combining characters: \u0930\u0903\u0951");

....yielded this output:

combine2

In this case the output was wider than the base character; one of the combining characters was placed above the base character, and the other was placed to the right of the base character.

I've provided both examples as screen shots because it can be difficult to find a font to render the resulting glyphs correctly.

Modifying Letters

In contrast to combining characters, modifying letters are freestanding. While they also usually modify another character (normally but not necessarily the preceding character) they are base characters themselves, and visually distinct. To use your example, here is the output of from a Java application printing the base character a followed by U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂) and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ) respectively:

A 0302: Â

A 02C6: Aˆ

The MODIFIER LETTER CIRCUMFLEX ACCENT is rendered to the right of the A whereas the COMBINING CIRCUMFLEX ACCENT is rendered above it.

The actual meaning (semantics) of the circumflex character as a modifying letter is context driven. For example, in French, the circumflex on the o in côté affects its pronunciation, but the circumflex on the u in sûr does not; instead it is used to visually distinguish sûr (meaning sure) from the identically pronounced sur (meaning on). In French a circumflex on o always affects pronunciation, and on u it never does.

Is it just a printable representation of the combining character...

No - the modifying letter carries meaning. In the case of the French circumflex that meaning may be context driven based on the letter it modifies, as described above. But the meaning can be contained within the modifying letter itself. For example:

Modifier letters are commonly used in technical phonetic transcriptional systems, where they augment the use of combining marks to make phonetic distinctions. Some of them have been adapted into regular language orthographies as well. For example, U+02BB MODIFIER LETTER TURNED COMMA is used to represent the 'okina (glottal stop) in the orthography for Hawaiian.

That example also shows that a modifying letter need not be associated with any other character. That is never the case with combining characters.

Also note that a modifier letter is not necessarily a letter in any alphabet, and the majority of modifier letters are actually symbols (e.g. the circumflex).

How is that different from the plain U+005E, CIRCUMFLEX ACCENT (^)?

That is simply the character used to represent a circumflex accent. Unlike combining characters and modifier letters, it cannot be semantically or visually associated with any other character.

See the following sections in The Unicode® Standard Version 11.0 – Core Specification for lots more detail:

  • 7.8 Modifier Letters
  • 7.9 Combining Marks
like image 31
skomisa Avatar answered Sep 27 '22 16:09

skomisa