I have an api which takes uni code data as c character array and sends it as a correct sms in uni code.
Now i have four code point values corresponding to four characters in some native alphabet and i want to send those correctly by inserting them into a c char array.
I tried
char test_data[] = {"\x00\x6B\x00\x6A\x00\x63\x00\x69"};
where 0x006B is one code point and so on.
The api internally is calling
int len = mbstowcs(NULL,test_data,0);
which results in 0 for above. Seems like 0x00 is treated as a terminating null.
I want to assign the above code points correctly to c array so they result into corresponding utf16 characters on the receiving phone (which does support the char set). If required i have the leverage to change the api too.
Platform is Linux with glib
UTF-16BE is not the native execution (AKA multibyte) character set and mbstowcs does expect null-terminated strings, so this will not work. Since you are using Linux, the function is probably expecting any char[] sequence to be UTF-8.
I believe you can transcode character data in Linux using uniconv. I've only used the ICU4C project.
Your code would read the UTF-16BE data, transcode it to a common form (e.g. uint8_t), then transcode it to the native execution character set prior to calling the API (which will then transcode it to the native wide character set.)
Note: this may be a lossy process if the execution character set does not contain the relevant code points, but you have no choice because this is what the API is expecting. But as I noted above, modern Linux systems should default to UTF-8. I wrote a little bit about transcoding codepoints in C here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With