Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert uni code point value (utf16) to C char array

I have an api which takes uni code data as c character array and sends it as a correct sms in uni code.

Now i have four code point values corresponding to four characters in some native alphabet and i want to send those correctly by inserting them into a c char array.

I tried

char test_data[] = {"\x00\x6B\x00\x6A\x00\x63\x00\x69"};

where 0x006B is one code point and so on.

The api internally is calling

int len = mbstowcs(NULL,test_data,0);

which results in 0 for above. Seems like 0x00 is treated as a terminating null.

I want to assign the above code points correctly to c array so they result into corresponding utf16 characters on the receiving phone (which does support the char set). If required i have the leverage to change the api too.

Platform is Linux with glib

like image 325
fkl Avatar asked Mar 11 '26 01:03

fkl


1 Answers

UTF-16BE is not the native execution (AKA multibyte) character set and mbstowcs does expect null-terminated strings, so this will not work. Since you are using Linux, the function is probably expecting any char[] sequence to be UTF-8.

I believe you can transcode character data in Linux using uniconv. I've only used the ICU4C project.

Your code would read the UTF-16BE data, transcode it to a common form (e.g. uint8_t), then transcode it to the native execution character set prior to calling the API (which will then transcode it to the native wide character set.)

Note: this may be a lossy process if the execution character set does not contain the relevant code points, but you have no choice because this is what the API is expecting. But as I noted above, modern Linux systems should default to UTF-8. I wrote a little bit about transcoding codepoints in C here.

like image 200
McDowell Avatar answered Mar 12 '26 17:03

McDowell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!