When I output certain emoji (specifically flags) into a subwindow in curses, it deforms the output, even for output outside of that subwindow.
Running:
import curses
def draw_screen(stdscr):
event = 0
stdscr.clear()
stdscr.refresh()
while (event != ord('q')):
emojis = ["πΊπΈ", "π", "π"]
# emojis = ["π", "π", "π"]
for i, emoji in enumerate(emojis):
box1 = stdscr.subwin(11, 11, 0, i*12)
box1.box()
box1.addstr(0, 4, emoji)
event = stdscr.getch()
if __name__ == "__main__":
curses.wrapper(draw_screen)
Produces:
If you switch the emoji for just the hearts it works fine:
I understand that the flag emoji is a sequence of regional indicator symbols, but I would have thought this should work, and I'm unsure how to fix it.
I've tested this in iTerm and Terminal on Mac OS 10.13 & 10.14.
(I also noticed that some other multi point emoji (π©βπ¬) print fine in raw Python but in curses are split into the two separate emoji that make them up. I'm not sure if this is related.)
ncurses uses the operating system's wcwidth
function to determine how wide a character will display. Terminal.app assumes that U+1F1FA and U+1F1F8 use two columns, while it appears that wcwidth
says they are only one column each. The green heart U+1F49A is treated by both wcwidth
and Terminal.app as double-width. You can see this by appending a character before and after the emoji symbol: where ncurses is misled, the resulting display shows overlapping characters.
Until Unicode 9 (2016), those particular codes are all "neutral width" according to Unicode's EastAsianWidth file. Unicode Technical Report #11 Unicode Character Property "East Asian Width" (from 1999) implies (never provides a clear definition) that the actual width of a "neutral width" character depends upon the context, i.e., if they are used in conjunction with double-width characters, they should be treated as double-width. For instance, it says
Narrow (and neutral) characters always map to half-width characters in the mixed-width set
but refers to "mixed-width" solely in terms of a mixture of "full-width" (two columns) and "narrow-width" (one column) characters.
The wcwidth
function usually (MacOS is probably not an exception) returns the same width for a given codepoint ignoring locale settings.
In Unicode 8, these are the relevant lines (a range of values):
1F1E6..1F1FF;N # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F400..1F579;N # So [378] RAT..JOYSTICK
In Unicode 9, U+1F49A is "full width", but the other two are neutral:
1F1E6..1F1FF;N # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F442..1F4FC;W # So [187] EAR..VIDEOCASSETTE
I don't see that those changed afterwards, through Unicode 12 (current).
Given all of that, it looks like an error in the font and/or wcwidth, which is carried forward by inertia (not much you can do about it until Apple gets around to making wcwidth
agree with its fonts).
By the way, you may find Proposal on use of ZERO WIDTH JOINER (ZWJ) between two Regional Indicator Symbols relevant to the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With