Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I print unicode character in C encoded with UTF8?

I am trying to print magnifying glass (http://www.fileformat.info/info/unicode/char/1f50e/index.htm), and I get this error:

[niko@dev1 ncurses]$ gcc -o utf8 -std=c99 $(ncursesw5-config --cflags --libs) utf8.c 
utf8.c: In function ‘main’:
utf8.c:12:10: error: \ud83d is not a valid universal character
   printw("\ud83ddd0e\n");         // escaped Unicode 
          ^
[niko@dev1 ncurses]$ cat utf8.c
#include <locale.h>
#include <curses.h>
#include <stdlib.h>


int main (int argc, char *argv[])
{
  setlocale(LC_ALL, "");

  initscr();

  printw("\ud83ddd0e\n");         // escaped Unicode 

  getch();
  endwin();

  return EXIT_SUCCESS;
}

What is the problem here? For, example, if I have a decimal number of encoding, which for magnifying glass is 55357 , how would I print it in printf to ncurses screen? (without using wchar_t because it wastes a lot of memory)

like image 946
Nulik Avatar asked Mar 12 '23 08:03

Nulik


1 Answers

The information on fileformat.info is wrong. The escapes on the page are \ud83d\udd0e. This is an UTF-16 surrogate pair as used on Java, but it does not work on C, as GCC seems to require that one \u escape represent one Unicode codepoint, which half of the surrogate escape is not.

You should instead use \U (uppercase) with 8 hexadecimal digits, so U+1F50E becomes \U0001F50E. This escaped character is output correctly with printf.


P.S: if instead of magnifying glass you see something like ~_~T~N, make sure that you've called the setlocale and actually linked against -lncursesw, failure to do either will mean that garbage will be printed instead.

like image 173