Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Curses - Certain emoji (flags) deform terminal output

When I output certain emoji (specifically flags) into a subwindow in curses, it deforms the output, even for output outside of that subwindow.

Running:


import curses

def draw_screen(stdscr):

    event = 0
    stdscr.clear()
    stdscr.refresh()

    while (event != ord('q')):

        emojis = ["πŸ‡ΊπŸ‡Έ", "πŸ’š", "πŸ’š"]
        # emojis = ["πŸ’š", "πŸ’š", "πŸ’š"]

        for i, emoji in enumerate(emojis):
            box1 = stdscr.subwin(11, 11, 0, i*12)
            box1.box()
            box1.addstr(0, 4, emoji)

        event = stdscr.getch()

if __name__ == "__main__":
    curses.wrapper(draw_screen)

Produces:

enter image description here

If you switch the emoji for just the hearts it works fine:

enter image description here

I understand that the flag emoji is a sequence of regional indicator symbols, but I would have thought this should work, and I'm unsure how to fix it.

I've tested this in iTerm and Terminal on Mac OS 10.13 & 10.14.

(I also noticed that some other multi point emoji (πŸ‘©β€πŸ”¬) print fine in raw Python but in curses are split into the two separate emoji that make them up. I'm not sure if this is related.)

like image 536
Tom Anthony Avatar asked Mar 04 '19 12:03

Tom Anthony


1 Answers

ncurses uses the operating system's wcwidth function to determine how wide a character will display. Terminal.app assumes that U+1F1FA and U+1F1F8 use two columns, while it appears that wcwidth says they are only one column each. The green heart U+1F49A is treated by both wcwidth and Terminal.app as double-width. You can see this by appending a character before and after the emoji symbol: where ncurses is misled, the resulting display shows overlapping characters.

illustration of overlap

Until Unicode 9 (2016), those particular codes are all "neutral width" according to Unicode's EastAsianWidth file. Unicode Technical Report #11 Unicode Character Property "East Asian Width" (from 1999) implies (never provides a clear definition) that the actual width of a "neutral width" character depends upon the context, i.e., if they are used in conjunction with double-width characters, they should be treated as double-width. For instance, it says

Narrow (and neutral) characters always map to half-width characters in the mixed-width set

but refers to "mixed-width" solely in terms of a mixture of "full-width" (two columns) and "narrow-width" (one column) characters.

The wcwidth function usually (MacOS is probably not an exception) returns the same width for a given codepoint ignoring locale settings.

In Unicode 8, these are the relevant lines (a range of values):

1F1E6..1F1FF;N   # So    [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F400..1F579;N   # So   [378] RAT..JOYSTICK

In Unicode 9, U+1F49A is "full width", but the other two are neutral:

1F1E6..1F1FF;N   # So    [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F442..1F4FC;W   # So   [187] EAR..VIDEOCASSETTE

I don't see that those changed afterwards, through Unicode 12 (current).

Given all of that, it looks like an error in the font and/or wcwidth, which is carried forward by inertia (not much you can do about it until Apple gets around to making wcwidth agree with its fonts).

By the way, you may find Proposal on use of ZERO WIDTH JOINER (ZWJ) between two Regional Indicator Symbols relevant to the problem.

like image 81
Thomas Dickey Avatar answered Nov 17 '22 22:11

Thomas Dickey