Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transitioning away from std::string, std::ostream, etc. in a library's public API

For API/ABI compatibility across many toolchains with the same binary, it is well known that STL containers, std::string, and other standard library classes like iostreams are verboten in public headers. (Exceptions to this are if one is distributing one build for each version of supported toolchains; one delivers source with no binaries for end-user compilation, which are not preferred options in the present case; or one translates to some other container inline so that a differing std implementation doesn't get ingested by the library.)

If one already had a published library API that did not follow this rule (asking for a friend), what is the best path forward while maintaining as much backwards compatibility as I reasonably can and favoring compile-time breakages where I can't? I need to support Windows and Linux.

Re the level of ABI compatibility I'm looking for: I don't need it to be insanely future-proof. I'm mainly looking to do just one library binary for multiple, popular Linux distros per release. (At present, I release one per compiler and sometimes special versions for a special distro (RHEL vs Debian). Same sort of concerns with MSVC versions -- one DLL for all supported MSVC versions would be ideal.) Secondarily, if I don't break the API in a bugfix release, I would like it to be ABI-compatible and a drop-in DLL/SO replacement without rebuilding the client application.

I have three cases with some tentative suggestions, modeled after Qt to a degree.

Old public API:

// Case 1: Non-virtual functions with containers
void Foo( const char* );
void Foo( const std::string& );

// Case 2: Virtual functions
class Bar
{
public:
    virtual ~Bar() = default;
    virtual void VirtFn( const std::string& );
};

// Case 3: Serialization
std::ostream& operator << ( std::ostream& os, const Bar& bar );

Case 1: Non-virtual functions with containers

In theory we can convert std::string uses to a class very much like std::string_view but under our library's API/ABI control. It will convert within our library header from a std::string so that the compiled library still accepts but is independent of the std::string implementation and is backwards compatible:

New API:

class MyStringView
{
public:
    MyStringView( const std::string& ) // Implicit and inline
    {
        // Convert, possibly copying
    }

    MyStringView( const char* ); // Implicit
    // ...   
};

void Foo( MyStringView ); // Ok! Mostly backwards compatible

Most client code that is not doing something abnormal like taking the address of Foo will work without modification. Likewise, we can create our own std::vector replacement, though it may incur a copying penalty in some cases.

Abseil's ToW #1 recommends starting at the util code and working up instead of starting at the API. Any other tips or pitfalls here?

Case 2: Virtual functions

But what about virtual functions? We break backwards compatibility if we change the signature. I suppose we could leave the old one in place with final to force breakage:

// Introduce base class for functions that need to be final
class BarBase
{
public:
    virtual ~BarBase() = default;
    virtual void VirtFn( const std::string& ) = 0;
};

class Bar : public BarBase
{
public:
    void VirtFn( const std::string& str ) final
    {
        VirtFn( MyStringView( str ) );
    }

    // Add new overload, also virtual
    virtual void VirtFn( MyStringView );
};

Now an override of the old virtual function will break at compile-time but calls with std::string will be automagically converted. Overrides should use the new version instead and will break at compile-time.

Any tips or pitfalls here?

Case 3: Serialization

I'm not sure what to do with iostreams. One option, at the risk of some inefficiency, is to define them inline and reroute them through strings:

MyString ToString( const Bar& ); // I control this, could be a virtual function in Bar if needed

// Here I publicly interact with a std object, so it must be inline in the header
inline std::ostream& operator << ( std::ostream& os, const Bar& bar )
{
    return os << ToString( bar );
}

If I made ToString() a virtual function, then I can iterate over all Bar objects and call the user's overrides because it only depends on MyString objects, which are defined in the header where they interact with std objects like the stream.

Thoughts, pitfalls?

like image 793
metal Avatar asked Mar 01 '18 22:03

metal


1 Answers

Tier 1

Use a good string view.

Don't use a std::string const& virtual overload; there is no reason for it. You are breaking ABI anyhow. Once they recompile, they'll see the new string-view based overload, unless they are taking and storing pointers to virtual functions.

To stream without going to intermediate string use continuation passing style:

void CPS_to_string( Bar const& bar, MyFunctionView< void( MyStringView ) > cps );

where cps is repeatedly called with partial buffers until object is serialized out it. Write << on top of that (inline in headers). There is some unavoidable overhead from function pointer indirection.

Now only use virtual in interfaces and never overload virtual methods and always add new methods at the end of the vtable. So don't expose complex heirarchies. Extending a vtable is ABI safe; adding to the middle is not.

FunctionView is a simple hand rolled non-owning std function clone whose state is a void* and a R(*)(void*,args&&...) which should be ABI stable to pass across library boundry.

template<class Sig>
struct FunctionView;

template<class R, class...Args>
struct FunctionView<R(Args...)> {
  FunctionView()=default;
  FunctionView(FunctionView const&)=default;
  FunctionView& operator=(FunctionView const&)=default;

  template<class F,
    std::enable_if_t<!std::is_same< std::decay_t<F>, FunctionView >{}, bool> = true,
    std::enable_if_t<std::is_convertible< std::result_of_t<F&(Args&&...)>, R>, bool> = true
  >
  FunctionView( F&& f ):
    ptr( std::addressof(f) ),
    f( [](void* ptr, Args&&...args)->R {
      return (*static_cast< std::remove_reference_t<F>* >(ptr))(std::forward<Args>(args)...);
    } )
  {}
private:
  void* ptr = 0;
  R(*f)(void*, Args&&...args) = 0;
};
template<class...Args>
struct FunctionView<void(Args...)> {
  FunctionView()=default;
  FunctionView(FunctionView const&)=default;
  FunctionView& operator=(FunctionView const&)=default;

  template<class F,
    std::enable_if_t<!std::is_same< std::decay_t<F>, FunctionView >{}, bool> = true
  >
  FunctionView( F&& f ):
    ptr( std::addressof(f) ),
    f( [](void* ptr, Args&&...args)->void {
      (*static_cast< std::remove_reference_t<F>* >(ptr))(std::forward<Args>(args)...);
    } )
  {}
private:
  void* ptr = 0;
  void(*f)(void*, Args&&...args) = 0;
};

this lets you pass generic callbacks over your API barrier.

// f can be called more than once, be prepared:
void ToString_CPS( Bar const& bar, FunctionView< void(MyStringView) > f );
inline std::ostream& operator<<( std::ostream& os, const Bar& bar )
{
  ToString_CPS( bar, [&](MyStringView str) {
    return os << str;
  });
  return os;
}

and implement ostream& << MyStringView const& in headers.


Tier 2

Forward every operation from a C++ API in headers to extern "C" pure-C functions (ie pass StringView as a pair of char const* ptrs). Export only an extern "C" set of symbols. Now symbol mangling changes no longer breaks ypur ABI.

C ABI is more stable than C++, and by forcing you to break library calls down into "C" calls you can make ABI breaking changes obvious. Use C++ header glue to make things clean, C to make ABI rock solid.

You can keep your pure virtual interfaces if you are willing to risk it; use the same rules as above (simple heirarchies, no overloads, only add to the end) and you'll get decent ABI stability.

like image 84
Yakk - Adam Nevraumont Avatar answered Oct 25 '22 14:10

Yakk - Adam Nevraumont