Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is current best practice around use of strings in cross-platform C and C++ APIs?

I looks like I may need to embark on some cross-platform project and part of it will have to be done in C or C++ (not decided yet hence the question is about them both). I will be dealing mostly with the text-based stuff and strings in general.

That C/C++ will have an API callable from the higher-level platform-dependent code.

My question is: what type(s) is it advisable to use to work with strings, in particular when declaring public interfaces? Are there any recommended standard techniques? Are there things to avoid?

I have little experience of writing C or C++ code, and even that was on Windows, so nothing like cross-platform here at all. So what I'm really looking for is for something to get me on the right way and avoid doing stupid things which are bound to cause a lot of pain.


Edit 1: To give a bit more context about the intended use. The API will be consumed by:

  • Objective C on iPhone/iPad/Mac via NSString and friends. The API can be statically linked, so no need to worry about .so .dll issues here.

  • Java via JNI on Android and other Java platforms

  • .NET via p/invoke from the managed C# code or natively statically linked if using C++/CLI.

  • There are some thoughts about using lua somehow/somewhere in this context. Don't know if this has any bearing on anything though.

like image 777
Philip P. Avatar asked Jul 26 '11 14:07

Philip P.


2 Answers

Rules

  • Use UTF formats to store strings, not "code pages" or whatnot (UTF-16 is probably easier edit: I totally forgot about byte order issues; UTF-8 is probably the way to go).

  • Use null-terminated strings instead of counted strings, as these are the easiest to access from most languages. But be careful about buffer overflows.
    Update 6 years later: I recommended this API for interoperability reasons (since so many already use null-termination, and there are multiple ways to represent counted strings), not the best one from a best-design standpoint. Today I would probably say the former is less important and recommend using counted strings rather than null-terminated strings if you can do it.

  • Do not even try to use classes like std::string to pass around strings to/from the user. Even your own program can break after upgrading your compiler/libraries (since their implementation detail is just that: an implementation detail), let alone the fact that non-C++ programs will have trouble with it.
    Update 6 years later: This is strictly for language and ABI compatibility reasons with other languages, not general advice for C++ program development. If you're doing C++ development, cross-platform or otherwise, use the STL! i.e. only follow this advice if you need to call your code from other languages.

  • Avoid allocating strings for the user unless it's truly painful for the user otherwise. Instead, take in a buffer and fill it up with data. That way you don't have to force the user to use a particular function to free the data. (This is also often a performance advantage as well, since it lets the user allocate small buffers on the stack. But if you do do that, provide your own function to free the data. You can't assume that your malloc or new can be freed with their free or delete -- they often can't be.)

Note:

Just to clarify, "let the user allocate the buffer" and "use NULL-terminated strings" do not run against each other. You still need to get the buffer length from the user, but you include the NULL when you terminate the string. My point was not that you should make a function similar to scanf("%s"), which is obviously unusably dangerous -- you still need the buffer length from the user. i.e. Do pretty much what Windows does in this regard.

like image 63
user541686 Avatar answered Oct 16 '22 22:10

user541686


That C/C++ will have an API callable from the higher-level platform-dependent code.

If by this you mean that you intend this library to be a DLL which may be called from other languages, for example, .NET languages, then I strongly recommend having all public API as extern "C" functions that have only POD types as parameters and return values. That is, prefer /*const*/ char* over std::string. Remember, C++, unlike plain C, has no standard ABI.

like image 22
Armen Tsirunyan Avatar answered Oct 16 '22 22:10

Armen Tsirunyan