Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

determine whether a unicode character is fullwidth or halfwidth in C++

Tags:

c++

unicode

I'm writing a terminal (console) application that is supposed to wrap arbitrary unicode text.

Terminals are usually using a monospaced (fixed width) font, so to wrap a text, it's barely more than counting characters and watching whether a word fits into a line or not and act accordingly.

Problem is that there are fullwidth characters in the Unicode table that take up the width of 2 characters in a terminal.

Counting these would see 1 unicode character, but the printed character is 2 "normal" (halfwidth) characters wide, breaking the wrapping routine as it is not aware of chars that take up twice the width.

As an example, this is a fullwidth character (U+3004, the JIS symbol)

〄
12

It does not take up the full width of 2 characters here although it's preformatted, but it does use twice the width of a western character in a terminal.

To deal with this, I have to distinguish between fullwidth or halfwidth characters, but I cannot find a way to do so in C++. Is it really necessary to know all fullwidth characters in the unicode table to get around the problem?

like image 220
Noice Avatar asked Feb 27 '13 14:02

Noice


2 Answers

You should use ICU u_getIntPropertyValue with the UCHAR_EAST_ASIAN_WIDTH property.

For example:

bool is_fullwidth(UChar32 c) {
    int width = u_getIntPropertyValue(c, UCHAR_EAST_ASIAN_WIDTH);
    return width == U_EA_FULLWIDTH || width == U_EA_WIDE;
}

Note that if your graphics library supports combining characters then you'll have to consider those as well when determining how many cells a sequence uses; for example e followed by U+0301 COMBINING ACUTE ACCENT will only take up 1 cell.

like image 144
ecatmur Avatar answered Sep 19 '22 02:09

ecatmur


There's no need to build tables, people from Unicode have already done that:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

The same code is used in terminal emulating software such as xterm[1], konsole[2] and quite likely others...

like image 43
kralyk Avatar answered Sep 21 '22 02:09

kralyk