Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 -> ASCII in C language

Tags:

c

ascii

utf-8

I have a simple question that I can't find anywhere over the internet, how can I convert UTF-8 to ASCII (mostly accented characters to the same character without accent) in C using only the standard lib? I found solutions to most of the languages out there, but not for C particularly.

Thanks!

EDIT: Some of the kind guys that commented made me double check what I needed and I exaggerated. I only need an idea on how to make a function that does: char with accent -> char without accent. :)

like image 488
diogocarmo Avatar asked Sep 15 '10 19:09

diogocarmo


People also ask

Can UTF-8 be read as ASCII?

Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8.

Is UTF-8 and ASCII same?

For characters represented by the 7-bit ASCII character codes, the UTF-8 representation is exactly equivalent to ASCII, allowing transparent round trip migration. Other Unicode characters are represented in UTF-8 by sequences of up to 6 bytes, though most Western European characters require only 2 bytes3.

Does C use UTF-8?

Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.


1 Answers

Take a look at libiconv. Even if you insist on doing it without libraries, you might find an inspiration there.

like image 95
zoul Avatar answered Oct 17 '22 11:10

zoul