Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Light C Unicode Library [closed]

Tags:

c

unicode

utf-8

I'm looking for a small C library to handle utf8 strings.

Specifically, splitting based on unicode delimiters for use with stemming algorithms.

Related posts have suggested:

ICU http://www.icu-project.org/ (I found it too bulky for my purposes on embedded devices)

UTF8-CPP: http://utfcpp.sourceforge.net/ (Excellent, but C++ not C)

Has anyone found any platform independent, small codebase libraries for handling unicode strings (doesn't need to do naturalisation).

like image 827
Akusete Avatar asked Nov 24 '08 06:11

Akusete


People also ask

Does C support Unicode?

It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.

Does UTF-8 cover all Unicode?

UTF-8 is a character encoding - a way of converting from sequences of bytes to sequences of characters and vice versa. It covers the whole of the Unicode character set.

Does C++ string support Unicode?

C++ provides a wide-character type, wchar_t , which can store Unicode strings. The exact implementation of wchar_t is implementation defined, but it is often UTF-32. The class wstring , defined in <string> , is a sequence of wchar_t s, just like the string class is a sequence of char s.

What does UTF-8 mean in Unicode?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”


2 Answers

A nice, light, library which I use successfully is utf8proc.

like image 189
Avi Avatar answered Sep 23 '22 10:09

Avi


There's also MicroUTF-8, but it may require login credentials to view or download the source.

like image 23
xenu Avatar answered Sep 22 '22 10:09

xenu