Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use wstring(s) in Linux APIs?

Tags:

c++

linux

wstring

I want to develope an application in Linux. I want to use wstring beacuse my application should supports unicode and I don't want to use UTF-8 strings.

In Windows OS, using wstring is easy. beacuse any ANSI API has a unicode form. for example there are two CreateProcess API, first API is CreateProcessA and second API is CreateProcessW.

wstring app = L"C:\\test.exe";
CreateProcess
(
  app.c_str(), // EASY!
  ....
);

But it seems working with wstring in Linux is complicated! for example there is an API in Linux called parport_open (It just an example).

and I don't know how to send my wstring to this API (or APIs like parport_open that accept a string parameter).

wstring name = L"myname";
parport_open
(
  0, // or a valid number. It is not important in this question.
  name.c_str(), // Error: because type of this parameter is char* not wchat_t*
  ....
);

My question is how can I use wstring(s) in Linux APIs?

Note: I don't want to use UTF-8 strings.

Thanks

like image 635
Amir Saniyan Avatar asked Sep 04 '11 14:09

Amir Saniyan


1 Answers

Linux APIs (on recent kernels and with correct locale setting) on almost every distribution use UTF-8 strings by default1. You too should use them inside your code. Resistance is futile.

The wchar_t (and thus wstring) on Windows were convenient only when Unicode was limited to 65536 characters (i.e. wchar_t were used for UCS-2), now that the 16-bit Windows wchar_t are used for UTF-16 the advantage of 1 wchar_t=1 Unicode character is long gone, so you have the same disadvantages of using UTF-8. Nowadays IMHO the Linux approach is the most correct. (Another answer of mine on UTF-16 and why Windows and Java use it)

By the way, both string and wstring aren't encoding-aware, so you can't reliably use any of these two to manipulate Unicode code points. I heard that wxString from the wxWidgets toolkit handles UTF-8 nicely, but I never did extensive research about it.


  1. actually, as pointed out below, the kernel aims to be encoding-agnostic, i.e. it treats the strings as opaque sequences of (NUL-terminated?) bytes (and that's why encodings that use "larger" character types like UTF-16 cannot be used). On the other hand, wherever actual string manipulation is done, the current locale setting is used, and by default on almost any modern Linux distribution it is set to UTF-8 (which is a reasonable default to me).
like image 180
Matteo Italia Avatar answered Oct 05 '22 00:10

Matteo Italia