Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically tell if a Unicode character takes up more than one character space in a terminal

I discovered that in the Mac OS X Terminal, some Unicode characters take up more than one character space. For example 27FC (long rightwards arrow from bar). It prints two characters wide, but the second character prints on top of whatever the next character is, so you have to do ⟼<space> for it to print correctly. For example, ⟼a prints like. Arrow + a (I made the font size large so that you could see it, but it does it for all font sizes).

By the way, this is the Menlo font in the Mac OS X 10.6 Terminal application.

23B3 (SUMMATION TOP) actually prints as two characters wide and tall (at least in Safari, it does this in the browser too, notice how it overlaps with the above line)⎲

However, in the terminal in Ubuntu, none of these characters print wider or taller than one character.

Is there a way to programmatically tell if a character takes up more than one space?

I'm using Python, so something that works either in pure Python or on POSIX (i.e., I can call some bash command using the os module) would be preferred.

Also, I should note that if I increase the "Character Spacing" setting in the font settings of the terminal to 1.5 (from the default 1.0), then it looks like Arrow + a spaced.

Also, it'd be nice if an answer could give some insight into all of this (i.e., why does it happen?)

like image 947
asmeurer Avatar asked Aug 17 '11 00:08

asmeurer


1 Answers

While it's not relevant for the specific examples you give (all of which display at the size of a single character for me on Ubuntu), CJK characters have a unicode property which indicates that they are wider than normal, and display at double width in some terminals.

For example, in python:

# 'a' is a normal (narrow) character
# '愛' can be interpreted as a double-width (wide) character
import unicodedata
assert unicodedata.east_asian_width('a') == 'N'
assert unicodedata.east_asian_width('愛') == 'W'

Apart from this, I don't think there's a specification for how much space certain characters should take up, other than the size of the glyph in whatever font you are using (which your terminal is probably ignoring for the reason Ignacio gave).

For more info on the "east asian width" property, see http://www.unicode.org/reports/tr11/

like image 151
mesilliac Avatar answered Sep 23 '22 08:09

mesilliac