Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::regex and dual ABI

Today I have found an interesting case of the dual libstdc++ ABI affecting compatibility of libraries.

Long story short, I have two libraries that both use std::regex internally. One is built with the CXX11 ABI and one is not. When these two libraries are linked together in one executable, it crashes on startup (before main is entered).

The libraries are unrelated and do not expose interfaces that mention any std:: types. I thought such libraries should be immune to dual ABI issues. Apparently not!

The issue can be reproduced easily this way:

// file.cc
#include <regex>
static std::regex foo("(a|b)");

// main.cc
int main() {}

// build.sh
g++ -o new.o file.cc
g++ -o old.o file.cc -D_GLIBCXX_USE_CXX11_ABI=0 
g++ -o main main.cc new.o old.o
./main

And the output is:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

The issue persists whatever I do. file.cc can be made into two separate source files, compiled into separate shared libraries, the two std::regex objects may have different names, they can be made global, static or automatic (one will need to call corresponding functions from main then). None of this helps.

Apparently (this is what comes out of my short investigation) the libstdc++ regex compiler has some kind of internal static data that stores std::string, and when two ABI-incompatible pieces of code try to use that data, it gets conflicting ideas about the layout of std::string objects.

So my questions are:

  • Is there a workaround for this problem?
  • Should this be considered a bug in libstdc++?

The problem is reproducible in several versions of g++/libstdc++ (I tried a few from 5.4 to 7.1). It doesn't occur with libc++.

like image 420
n. 1.8e9-where's-my-share m. Avatar asked Jul 17 '18 13:07

n. 1.8e9-where's-my-share m.


1 Answers

The problem stems to the origin of why libstdc++ has dual ABI. From this two important statements: (1) it was specifically introduced to conform with the new 11th standard in regards to how string (and other that is not relevant for this discussion) works; (2) _GLIBCXX_USE_CXX11_ABI works independently of a dialect, and used to compile C++03 and C++11 together.

regex module was introduced in the 11th standard, and uses strings internally. So you build your c++-11 (or higher) template basic_regex code with _GLIBCXX_USE_CXX11_ABI=0. That means you are using c++-11 regex object with a pre-c++-11 implementation of strings.

Should that work? Depending on how regex uses strings, if it does rely on new implementation (e.g. forbidden copy-on-write), then no, otherwise yes. What can happen? Anything.

To the bottom of it, you should not use _GLIBCXX_USE_CXX11_ABI=0 on any new code that uses post-c++-03 dialect (i.e. c++-11,14,17,...), because it introduces implementations that are not compatible with the new guarantees on standard objects, particularly std::string.

Can I use _GLIBCXX_USE_CXX11_ABI=0 with std>=c++-11? GCC developers took care that you can run new stuff with an old ABI, it benefits with a possibility of having new features running with old shared libraries. However that might not be a good idea, also because the code is in a new standard however the standard library does not conform to this standard, might turn out badly later. You problem is kind of an example of that. That you can by mix two ABI and here we are it is not working.

_GLIBCXX_USE_CXX11_ABI=0 is really usable if you call, for example, foo(std::string const&) defined in some .so library, compiled with an old ABI. Then in your new source file you would like to compile this source with an old ABI. But all other sources you would keep with a new ABI.

The problem is reproducible in several versions of g++/libstdc++ (I tried a few from 5.4 to 7.1). It doesn't occur with libc++.

libc++ does not have this duality, i.e. single string implementation.

I do not give a clear answer where this exception is coming from or why. I only might guess that there is some shared global resource related to regex, string, or locale that is not distinguished clearly between ABIs. And different ABIs work with it differently what can result in anything, e.g. exception, segment fault, any unexpected behavior. IMHO, I prefer to stick with the rules, I mentioned above, that are most closely reflect the intent of _GLIBCXX_USE_CXX11_ABI and dual ABI.

like image 127
Yuki Avatar answered Nov 03 '22 05:11

Yuki