Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is std::codecvt only used by file I/O streams?

I've been implementing a codecvt for handling indentiation of output streams. It can be used like this and works fine:

std::cout << indenter::push << "im indentet" << indenter::pop << "\n im not..."

However, while I can imbue an std::codecvt to any std::ostream I was very confused when I found out that my code worked with std::cout as well as std::ofstream, but not for example for std::ostringstream even while all of which inherit from the base class std::ostream.

The facet is constructed normally, the code compiles, it doesn't throw any exceptions... It's just that none of the member functions of the std::codecvt are called.

For me that is very confusing and I had to spend a lot of time figuring out that std::codecvt won't do anything on non file I/O streams.

Is there any reason std::codecvt is not being used by all classes inherited by std::ostream?

Furthermore does anyone have an idea on which structs I could fall back on to implement the indenter?

Edit: this is the part of the language I'm referring to:

All file I/O operations performed through std::basic_fstream use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Source: https://en.cppreference.com/w/cpp/locale/codecvt


Update 1:

I've made a small example illustrating my problem:

#include <iostream>
#include <locale>
#include <fstream>
#include <sstream>

static auto invocation_counter = 0u;

struct custom_facet : std::codecvt<char, char, std::mbstate_t>
{
  using parent_t = std::codecvt<char, char, std::mbstate_t>;

  custom_facet() : parent_t(std::size_t { 0u }) {}

  using parent_t::intern_type;
  using parent_t::extern_type;
  using parent_t::state_type;

  virtual std::codecvt_base::result do_out (state_type& state, const intern_type* from, const intern_type* from_end, const intern_type*& from_next,
                                                               extern_type* to, extern_type* to_end, extern_type*& to_next) const override
  {
    while (from < from_end && to < to_end)
    {
      *to = *from;

      to++;
      from++;
    }

    invocation_counter++;

    from_next = from;
    to_next = to;

    return std::codecvt_base::noconv;
  }

  virtual bool do_always_noconv() const throw() override
  {
    return false;
  }
};

std::ostream& imbueFacet (std::ostream& ostream)
{
  ostream.imbue(std::locale { ostream.getloc(), new custom_facet{} });

  return ostream;
}

int main()
{
  std::ios::sync_with_stdio(false);

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
    auto ofstream = std::ofstream { "testFile.txt" };

    ofstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
     auto osstream = std::ostringstream {};

     osstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";
}

I would except invocation_counter to increase after streaming in the std::ostringstream, but it doesn't.


Update 2:

After more research I found out that I could use std::wbuffer_converter. To quote https://en.cppreference.com/w/cpp/locale/wbuffer_convert

std::wbuffer_convert is a wrapper over stream buffer of type std::basic_streambuf<char> which gives it the appearance of std::basic_streambuf<Elem>. All I/O performed through std::wbuffer_convert undergoes character conversion as defined by the facet Codecvt. [...]

This class template makes the implicit character conversion functionality of std::basic_filebuf available for any std::basic_streambuf.

This way I can apply a facet to a std::ostringstream:

auto osstream = std::ostringstream {};

osstream << "test\n";
  
auto facet = custom_facet{};
  
std::wstring_convert<custom_facet, char> conv;
  
auto str = conv.to_bytes(osstream.str());

However, I lose the ability to concate facets using the streaming operator <<.

This confuses me even more why the std::codecvt is not implicity used by ALL output streams. All output streams inherit from std::basic_streambuf whose interface is suitable to using std::codecvt, which is just using an input and an output character sequence, fully implemented in std::basic_streambuf.

So why is the parsing of std::codecvt implemented in std::basic_filebuf instead of std::basic_streambuf? std::basic_filebuf inherits std::basic_streambuf after all...

Either I have some fundamental misunderstanding on how streams work in C++ or std::codecvt is poorly integrated in the standard. Maybe this is why it is marked as deprecated?

like image 858
user3520616 Avatar asked Nov 23 '20 23:11

user3520616


People also ask

What is codecvt in c++?

Class template std::codecvt encapsulates conversion of character strings, including wide and multibyte, from one encoding to another.

How streams work in c++?

1.1 Streams C/C++ IO are based on streams, which are sequence of bytes flowing in and out of the programs (just like water and oil flowing through a pipe). In input operations, data bytes flow from an input source (such as keyboard, file, network or another program) into the program.

What is the use of codecvt in C++?

Class template std::codecvt encapsulates conversion of character strings, including wide and multibyte, from one encoding to another. All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

What is file and stream I/O in Linux?

File and stream I/O (input/output) refers to the transfer of data either to or from a storage medium. In.NET, the System.IO namespaces contain types that enable reading and writing, both synchronously and asynchronously, on data streams and files.

Can read and write in pipestream?

Depending on the underlying data source or repository, a stream might support only some of these capabilities. For example, the PipeStream class does not support seeking. The CanRead, CanWrite, and CanSeek properties of a stream specify the operations that the stream supports.

What is file I/O in C++?

File and stream I/O (input/output) refers to the transfer of data either to or from a storage medium. In the .NET Framework, the System.IO namespaces contain types that enable reading and writing, both synchronously and asynchronously, on data streams and files.


1 Answers

The std::codecvt facet was originally intended to handle I/O conversions between disk and memory character representation. Quoted from paragraph 39.4.6 of Bjarne Stroustrup's The C++ Programming Language fourth edition:

Sometimes, the representation of characters stored in a file differs from the desired representation of those same characters in main memory. ... the codecvt facet provides a mechanism for converting characters from one representation to another as they are read or written.

The intended purpose was thus to use std::codecvt only for adapting characters between file (disk) and memory, which partly answers your question:

Why is std::codecvt only used by file I/O streams?

From the docs we see that:

All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Which then answers the question why std::ofstream (uses a file-based streambuffer) and std::cout (linked to standard output FILE stream) invokes std::codecvt.

Now, to use the high-level std::ostream interface you need to provide an underlying streambuf. The std::ofstream provides a filebuf and the std::ostringstream provides a stringbuf (which is not linked to the use of std::codecvt). See this post over the streams, which also highlights the following:

...in the case of ofstream, there are also a few extra functions which forward to additional functions in the filebuf interface

But, to invoke the character conversion functionality of a std::codecvt when you have a std::ostringstream which is a std::ostream with an underlying std::basic_streambuf you can use, as indicated in your post, the std::wbuffer_convert.

You have only used the std::wstring_convert in your second update and not the std::wbuffer_convert.

When using the std::wbuffer_convert you can wrap the original std::ostringstream with a std::ostream as follows:

// Create a std::ostringstream
auto osstream = std::ostringstream{};

// Create the wrapper for the ostringstream
std::wbuffer_convert<custom_facet, char> wrapper(osstream.rdbuf());

// Now create a std::ostream which uses the wrapper to send data to
// the original std::ostringstream
std::ostream normal_ostream(&wrapper);
normal_ostream << "test\n";

// Flush the stream to invoke the conversion
normal_ostream << std::flush;

// Check the invocation_counter
std::cout << "invocation_counter after wrapping std::ostringstream with "
                "std::wbuffer_convert = "
            << invocation_counter << "\n";

Together with the complete example here, the output would be:

invocation_counter start of test1 = 0
invocation_counter after std::ofstream = 1
> test printed to std::cout
invocation_counter after std::cout = 2
invocation_counter after std::ostringstream (should not have changed)= 2
ic after test1 = 2
invocation_counter after std::ostringstream with std::wstring_convert = 3
ic after test2 = 3
invocation_counter after wrapping std::ostringstream with std::wbuffer_convert = 4
ic after test3 = 4

Conclusion

std::codecvt was intended for converting between disk and memory representation. That is why the std::codecvt implementation is only called with streams using an underlying filebuf such as std::ofstream and std::cout. However, a stream using an underlying stringbuf can be wrapped using std::wbuffer_convert into a std::ostream instance which would then invoke the underlying std::codecvt.

like image 157
Jan Gabriel Avatar answered Oct 25 '22 11:10

Jan Gabriel