Note: This is a question-with-answer in order to document a technique that others might find useful, and in order to perhaps become aware of others’ even better solutions. Do feel free to add critique or questions as comments. Also do feel free to add additional answers. :)
Problem #1:
#include <iostream>
#include <string>
using namespace std;
auto main() -> int
{
wstring username;
wcout << L"Hi, what’s your name? ";
getline( wcin, username );
wcout << "Pleased to meet you, " << username << "!\n";
}
H:\personal\web\blog alf on programming at wordpress\002\code>chcp 65001 Active code page: 65001 H:\personal\web\blog alf on programming at wordpress\002\code>g++ problem.input.cpp -std=c++14 H:\personal\web\blog alf on programming at wordpress\002\code>a Hi, whatSøren Moskégård ← No visible output. H:\personal\web\blog alf on programming at wordpress\002\code>_
At the Windows API level a solution is to use non-stream-based direct console i/o when the relevant standard stream is bound to the console. For example, using the WriteConsole
API function. And as an extension supported by both Visual C++ and MinGW g++ standard libraries, a mode can be set for the standard wide streams where WriteConsole
is used, and there is also a mode for converting to/from UTF-8 as the external encoding.
And in Unix-land, a single call to setlocale( LC_ALL, "" )
, or its higher level C++ equivalent, suffices to make the wide streams work.
But how can such modes be set transparently & automatically, so that the same ordinary standard C++ code using the wide streams will work both in Windows and Unix-land?
Noting, for the readers who shudder at the thought of using wide text in a Unix-land program, that this is in effect a pre-requisite for portable code that uses UTF-8 narrow text console i/o in Unix-land. Namely, code that automatically uses UTF-8 narrow text in Unix-land and wide text in Windows becomes possible and can be built on top of support for Unicode in Windows. But without such support, no portability for the general case.
Problem #2:
wchar_t const*
doesn't work. #include <iostream>
using namespace std;
struct Byte_string
{ operator char const* () const { return "Hurray, it works!"; } };
struct Wide_string
{ operator wchar_t const* () const { return L"Hurray, it works!"; } };
auto main() -> int
{
wcout << "Byte string pointer: " << Byte_string() << endl;
wcout << "Wide string pointer: " << Wide_string() << endl;
}
Byte string pointer: Hurray, it works! Wide string pointer: 0x4ad018
This is a defect of the inconsistency type at the implementation level in the standard, that I reported long ago. I'm not sure of the status, it may have been forgotten (I never got any mailings about it), or maybe a fix will be applied in C++17. Anyway, how can one work around that?
In short, how can one make standard C++ code that uses Unicode wide text console i/o, work and be practical in both Windows and Unix-land?
#pragma once
//----------------------------------------------------------------------------------------
// PROBLEM DESCRIPTION.
//
// Output of wchar_t const* is only supported via an operator<< template. User-defined
// conversions are not considered for template matching. This results in actual argument
// with user conversion to wchar_t const*, for a wide stream, being presented as the
// pointer value instead of the string.
#include <iostream>
#ifndef CPPX_NO_IOSTREAM_CONVERSION_FIX
namespace std{
template< class Char_traits >
inline auto operator<<(
basic_ostream<wchar_t, Char_traits>& stream,
wchar_t const ch
)
-> basic_ostream<wchar_t, Char_traits>&
{ return operator<< <wchar_t, Char_traits>( stream, ch ); }
template< class Char_traits >
inline auto operator<<(
basic_ostream<wchar_t, Char_traits>& stream,
wchar_t const* const s
)
-> basic_ostream<wchar_t, Char_traits>&
{ return operator<< <wchar_t, Char_traits>( stream, s ); }
} // namespace std
#endif
This is a standard library extension that's supported by both Visual C++ and MinGW g++.
First, just because it's used in the code, definition of the Ptr
type builder (the main drawback of library-provided type builders is that ordinary type inference doesn't kick in, i.e. it's necessary in some cases to still use the raw operator notation):
⋮
template< class T > using Ptr = T*;
⋮
A helper definition, because it's used in more than one file:
cppx/stdlib/Iostream_mode.hpp#pragma once
// Mode for a possibly console-attached iostream, such as std::wcout.
namespace cppx {
enum Iostream_mode: int { unknown, utf_8, direct_io };
} // namespace cppx
Mode setters (base functionality):
cppx/stdlib/impl/utf8_mode.for_windows.hpp#pragma once
// UTF-8 mode for a stream in Windows.
#ifndef _WIN32
# error This is a Windows only implementation.
#endif
#include <cppx/stdlib/Iostream_mode.hpp>
#include <stdio.h> // FILE, stdin, stdout, stderr, etc.
// Non-standard headers, which are de facto standard in Windows:
#include <io.h> // _setmode, _isatty, _fileno etc.
#include <fcntl.h> // _O_WTEXT etc.
namespace cppx {
inline
auto set_utf8_mode( const Ptr< FILE > f )
-> Iostream_mode
{
const int file_number = _fileno( f ); // See docs for error handling.
if( file_number == -1 ) { return Iostream_mode::unknown; }
const int new_mode = (_isatty( file_number )? _O_WTEXT : _O_U8TEXT);
const int previous_mode = _setmode( file_number, new_mode );
return (0?Iostream_mode()
: previous_mode == -1? Iostream_mode::unknown
: new_mode == _O_WTEXT? Iostream_mode::direct_io
: Iostream_mode::utf_8
);
}
} // namespace cppx
cppx/stdlib/impl/utf8_mode.generic.hpp
#pragma once
#include <stdio.h> // FILE, stdin, stdout, stderr, etc.
#include <cppx/core_language/type_builders.hpp> // cppx::Ptr
namespace cppx {
inline
auto set_utf8_mode( const Ptr< FILE > )
-> Iostream_mode
{ return Iostream_mode::unknown; }
} // namespace cppx
cppx/stdlib/utf8_mode.hpp
#pragma once
// UTF-8 mode for a stream. For Unix-land this is a no-op & the locale must be UTF-8.
#include <cppx/core_language/type_builders.hpp> // cppx::Ptr
#include <cppx/stdlib/Iostream_mode.hpp>
namespace cppx {
inline
auto set_utf8_mode( const Ptr< FILE > ) -> Iostream_mode;
} // namespace cppx
#ifdef _WIN32 // This also covers 64-bit Windows.
# include "impl/utf8_mode.for_windows.hpp" // Using Windows-specific _setmode.
#else
# include "impl/utf8_mode.generic.hpp" // A do-nothing implementation.
#endif
In addition to setting direct console i/o mode or UTF-8 as appropriate in Windows, this fixes the implicit conversion defect; (indirectly) calls setlocale
so that wide streams work in Unix-land; sets boolalpha
just for good measure, as a more reasonable default; and includes all standard library headers to do with iostreams (I don't show the separate header file that does that, and it is to a degree a personal preference how much to include or whether to do such inclusion at all):
#pragma once
// Standard iostreams but configured to work, plus, as utility, with boolalpha set.
#include <raw_stdlib/iostreams.hpp> // <iostream>, <sstream>, <fstream> etc. for convenience.
#include <cppx/core_language/type_builders.hpp> // cppx::Ptr
#include <cppx/stdlib/utf8_mode.hpp> // stdin etc., stdlib::set_utf8_mode
#include <locale> // std::locale
#include <string> // std::string
#include <cppx/stdlib/impl/iostreams_conversion_defect.fix.hpp> // Support arg conv.
inline auto operator<< ( std::wostream& stream, const std::string& s )
-> std::wostream&
{ return (stream << s.c_str()); }
// The following code's sole purpose is to automatically initialize the streams.
namespace cppx { namespace utf8_iostreams {
using std::locale;
using std::ostream;
using std::cin; using std::cout; using std::cerr; using std::clog;
using std::wostream;
using std::wcin; using std::wcout; using std::wcerr; using std::wclog;
using std::boolalpha;
namespace detail {
using std::wstreambuf;
// Based on "Filtering streambufs" code by James Kanze published at
// <url: http://gabisoft.free.fr/articles/fltrsbf1.html>.
class Correcting_input_buffer
: public wstreambuf
{
private:
wstreambuf* provider_;
wchar_t buffer_;
protected:
auto underflow()
-> int_type override
{
if( gptr() < egptr() ) { return *gptr(); }
const int_type result = provider_->sbumpc();
if( result == L'\n' )
{
// Ad hoc workaround for g++ extra newline undesirable behavior:
provider_->pubsync();
}
if( traits_type::not_eof( result ) )
{
buffer_ = result;
setg( &buffer_, &buffer_, &buffer_ + 1 );
}
return result ;
}
public:
Correcting_input_buffer( wstreambuf* a_provider )
: provider_( a_provider )
{}
};
} // namespace detail
class Usage
{
private:
static
void init_once()
{
// In Windows there is no UTF-8 encoding spec for the locale, in Unix-land
// it's the default. From Microsoft's documentation: "If you provide a code
// page like UTF-7 or UTF-8, setlocale will fail, returning NULL". Still
// this call is essential for making the wide streams work correctly in
// Unix-land.
locale::global( locale( "" ) ); // Effects a `setlocale( LC_ALL, "" )`.
for( const Ptr<FILE> c_stream : {stdin, stdout, stderr} )
{
const auto new_mode = set_utf8_mode( c_stream );
if( c_stream == stdin && new_mode == Iostream_mode::direct_io )
{
static detail::Correcting_input_buffer correcting_buffer( wcin.rdbuf() );
wcin.rdbuf( &correcting_buffer );
}
}
for( const Ptr<ostream> stream_ptr : {&cout, &cerr, &clog} )
{
*stream_ptr << boolalpha;
}
for( const Ptr<wostream> stream_ptr : {&wcout, &wcerr, &wclog} )
{
*stream_ptr << boolalpha;
}
}
public:
Usage()
{ static const bool dummy = (init_once(), true); (void) dummy; }
};
namespace detail {
const Usage usage;
} // namespace detail
}} // namespace cppx::utf8_iostreams
The two example programs in the question are fixed simply by including the above header instead of or in addition to <iostream>
. When it's in addition to it can be in a separate translation unit (except for the implicit conversion defect fix, if that's desired the header for it must be included somehow). Or e.g. as a forced include in the build command.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With