Number of character cells used by string

Question

I have a program that outputs a textual table using UTF-8 strings, and I need to measure the number of monospaced character cells used by a string so I can align it properly. If possible, I'd like to do this with standard functions.

Maxim Egorushkin · Accepted Answer

From UTF-8 and Unicode FAQ for Unix/Linux:

The number of characters can be counted in C in a portable way using mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, as long as the appropriate locale has been selected. A hard-wired technique to count the number of characters in a UTF-8 string is to count all bytes except those in the range 0x80 – 0xBF, because these are just continuation bytes and not characters of their own. However, the need to count characters arises surprisingly rarely in applications.

mpez0 · Answer

You may or may not have a UTF-8 compatible strlen(3) function available. However, there are some simple C functions readily available that do the job quickly.

The efficient C solutions examine the start of the character to skip continuation bytes. The simple code (referenced from the link above) is

int my_strlen_utf8_c(char *s) {    int i = 0, j = 0;    while (s[i]) {      if ((s[i] & 0xc0) != 0x80) j++;      i++;    }    return j; }

The faster version uses the same technique, but prefetches data and does multi-byte compares, resulting is a substantial speedup. The code is longer and more complex, however.

Number of character cells used by string

Tags:

c

string

linux

utf-8

codemuppet

2 Answers

Maxim Egorushkin

mpez0

Recent Activity

Donate For Us

Number of character cells used by string

Tags:

c

string

linux

utf-8

codemuppet

2 Answers

Maxim Egorushkin

mpez0

Related questions

Recent Activity

Donate For Us