Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The width specifier in printf does not work properly with accented characters

Tags:

c

printf

I'm trying to format output of some strings in c with the width specifier and the printf-function. However I'm having trouble getting the behaviour I want. It seems that everytime printf encounters the character å, ä or ö the the width reserved for the string gets one position smaller.

A code-snippet to illustrate:

#include <stdio.h>

int main(void)
{
  printf(">%-10s<\n", "aoa");
  printf(">%-10s<\n", "aäoa");
  printf(">%-10s<\n", "aäoöa");
  printf(">%-10s<\n", "aäoöaå");

  return 0;
}

Outputs in my ubuntu linux bash-shell.

>aoa       <
>aäoa     <
>aäoöa   <
>aäoöaå <

I'm looking for advice on how to deal with this. What I want is for all the strings in the snippet above to print within space-padded 10 char wide field like so:

>aoa       <
>aäoa      <
>aäoöa     <
>aäoöaå    <

I also appreciate any insight as to why this is happening or feedback if this is not an issue with other setups.

like image 413
Erik Göök Avatar asked Feb 16 '16 08:02

Erik Göök


People also ask

What will happen if you use wrong formatting characters in printf?

It is Undefined behavior! Undefined behavior means that anything can happen. It may show you results which you expect or it may not or it may crash.

What is formatted output using printf() statement explain I?

One, the printf (short for "print formatted") function, writes output to the computer monitor. The other, fprintf, writes output to a computer file. They work in almost exactly the same way, so learning how printf works will give you (almost) all the information you need to use fprintf.

How to give space in printf in C?

If you want the word "Hello" to print in a column that's 40 characters wide, with spaces padding the left, use the following. char *ptr = "Hello"; printf("%40s\n", ptr); That will give you 35 spaces, then the word "Hello".


2 Answers

Use wide character strings and wprintf:

#include <cwchar>
#include <locale.h>

int main(void)
{
  // seems to be needed for the correct output encoding
  setlocale(LC_ALL, "");

  wprintf(L">%-10ls<\n", L"aoa");
  wprintf(L">%-10ls<\n", L"aäoa");
  wprintf(L">%-10ls<\n", L"aäoöa");
  wprintf(L">%-10ls<\n", L"aäoöaå");

  return 0;
}
like image 139
Flopp Avatar answered Nov 10 '22 01:11

Flopp


why this is happening?

Take a look to The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

As an alternative to wide chars and under UTF8, you can use this function to count the number of non-ASCII chars, then, you can add the result to the width specifier of printf:

#include <stdio.h>

int func(const char *str)
{
    int len = 0;

    while (*str != '\0') {
        if ((*str & 0xc0) == 0x80) {
            len++;
        }
        str++;
    }
    return len;
}

int main(void)
{
    printf(">%-*s<\n", 10 + func("aoa"), "aoa");
    printf(">%-*s<\n", 10 + func("aäoa"), "aäoa");
    printf(">%-*s<\n", 10 + func("aäoöa"), "aäoöa");
    printf(">%-*s<\n", 10 + func("aäoöaå"), "aäoöaå");
    return 0;
}

Output:

>aoa       <
>aäoa      <
>aäoöa     <
>aäoöaå    <
like image 22
David Ranieri Avatar answered Nov 10 '22 00:11

David Ranieri