Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding combining diacritics for character at point in emacs

I am writing a function which returns linguistic information about the character at point. This is easy for pre-composed characters. However, I wish to account for diacritics. I believe these are referred to as "marks" or "combining characters" in Unicode (cf. plane U+0300 - U+036F).

For example, to place the centralization diacritic (U+0306) on the character e:

e C-x 8 <RET> 0306 <RET>

Run C-u C-x = on the resulting character and you will see something like "Composed with the following character(s) ̆ "

Functions such as following-char unfortunately only return the base character, i.e. "e", and ignore any combining diacritics. Is there any way to get these?

EDIT: slitvinov pointed out that the resulting glyph consists of two characters. If you place point before the glyph created by the above code, and execute (point) before and after running forward-char, you will see point increase by 2. I figured I could hack a solution through this behaviour, but it appears that inside a progn statement (or function definition), forward-char only moves point forward by one... try it in a defun or with (progn (forward-char) (point)). Why might this be?

like image 454
Dan Avatar asked Nov 03 '22 19:11

Dan


1 Answers

I think diacritic e is treated as two characters. I put this combination in the file e(diacritic e)e.

ĕee
(char-after 1)
(char-after 2)
(char-after 3)
(char-after 4)

It gives me.

101 101 774 101

And 774 is a decimal form of 0306.

like image 58
slitvinov Avatar answered Nov 15 '22 11:11

slitvinov