Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is © (the copyright symbol) replaced with (C) when using wprintf?

Tags:

c

printf

widechar

When I try to print the copyright symbol © with printf or write, it works just fine:

#include <stdio.h>

int main(void)
{
    printf("©\n");
}

#include <unistd.h>

int main(void)
{
    write(1, "©\n", 3);
}

Output:

©

But when I try to print it with wprintf, I get (C):

#include <stdio.h>
#include <wchar.h>

int main(void)
{
    wprintf(L"©\n");
}

Output:

(C)

It's fixed when I add a call to setlocale, though:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "");
    wprintf(L"©\n");
}

Output:

©

Why is the original behavior present and why is it fixed when I call setlocale? Additionally, where does this conversion take place? And how can I make the behavior after setlocale the default?

compilation command:

gcc test.c

locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

echo $LC_CTYPE:


uname -a:

Linux penguin 4.19.79-07511-ge32b3719f26b #1 SMP PREEMPT Mon Nov 18 17:41:41 PST 2019 x86_64 GNU/Linux

file test.c (same on all of the examples):

test.c: C source, UTF-8 Unicode text

gcc --version:

gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/lib/x86_64-linux-gnu/libc-2.24.so (glibc version):

GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

cat /etc/debian_version:

9.12
like image 353
S.S. Anne Avatar asked Feb 28 '20 20:02

S.S. Anne


People also ask

How do you write the copyright symbol in C++?

You can use alt+0169.

How to print wide char in C?

The wprintf() function is used to print the wide character to the standard output. The wide string format may contain the format specifiers which is starting with % sign, these are replaced by the values of variables which are passed to the wprintf(). int wprintf (const wchar_t* format, ...);

How do I insert a copyright symbol in Visual Studio?

Shortcut is Alt+0169, that is Press and hold the Alt Key and at the same time type 0169 then release the Alt key, the character will appear at the cursor.


1 Answers

The locale of the calling processes is not automatically inherited by the new process.

When the program first starts up, it is in the C locale. The man page for setlocale(3) says the following:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

...

The locale "C" or "POSIX" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.

So any multibyte / non-ASCII character is converted into one or more ASCII characters as the output shows.

The locale can be set as follows:

setlocale(LC_ALL, "");

The LC_ALL flag specifies changing all locale-related variables. An empty string for the locale means to set the locale according to the relevant environment variables. Once this is done, you should see the characters for your shell's locale.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main()
{
    char *before = setlocale(LC_ALL, NULL);
    setlocale(LC_ALL, "");
    char *after = setlocale(LC_ALL, NULL);

    wprintf(L"before locale: %s\n", before);
    wprintf(L"after locale: %s\n", after);
    wprintf(L"©\n");
    wprintf(L"\u00A9\n");
    return 0;
}

Output:

before locale: C
after locale: en_US.utf8
©
©
like image 99
dbush Avatar answered Sep 17 '22 23:09

dbush