Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of character cells used by string

I have a program that outputs a textual table using UTF-8 strings, and I need to measure the number of monospaced character cells used by a string so I can align it properly. If possible, I'd like to do this with standard functions.

like image 820
codemuppet Avatar asked Feb 25 '11 12:02

codemuppet


2 Answers

From UTF-8 and Unicode FAQ for Unix/Linux:

The number of characters can be counted in C in a portable way using mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, as long as the appropriate locale has been selected. A hard-wired technique to count the number of characters in a UTF-8 string is to count all bytes except those in the range 0x80 – 0xBF, because these are just continuation bytes and not characters of their own. However, the need to count characters arises surprisingly rarely in applications.

like image 149
Maxim Egorushkin Avatar answered Sep 21 '22 02:09

Maxim Egorushkin


You may or may not have a UTF-8 compatible strlen(3) function available. However, there are some simple C functions readily available that do the job quickly.

The efficient C solutions examine the start of the character to skip continuation bytes. The simple code (referenced from the link above) is

int my_strlen_utf8_c(char *s) {    int i = 0, j = 0;    while (s[i]) {      if ((s[i] & 0xc0) != 0x80) j++;      i++;    }    return j; } 

The faster version uses the same technique, but prefetches data and does multi-byte compares, resulting is a substantial speedup. The code is longer and more complex, however.

like image 27
mpez0 Avatar answered Sep 22 '22 02:09

mpez0