Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define string literal with character type that depends on template parameter?

template<typename CharType>
class StringTraits {
public:
    static const CharType NULL_CHAR = '\0';
    static constexpr CharType* WHITESPACE_STR = " ";
};

typedef StringTraits<char> AStringTraits;
typedef StringTraits<wchar_t> WStringTraits;

I know I could do it with template specialization, but this would require some duplication (by defining string literals with and without L prefix).

Is there a simpler way to define const/constexpr char/wchar_t and char*/wchar_t* with same string literal in a template class?

like image 346
user1633272 Avatar asked Oct 10 '18 10:10

user1633272


People also ask

Can string literals be assigned to char?

String literals are usually referred to by a pointer to (or array of) characters. Ideally, they should be assigned only to pointers to (or arrays of) const char or const wchar_t . It is unspecified whether these arrays of string literals are distinct from each other.

How do you define a string literal?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. You must always prefix wide-string literals with the letter L.

How can you define wide character in the string literal?

A wide string literal is a null-terminated array of constant wchar_t that is prefixed by ' L ' and contains any graphic character except the double quotation mark ( " ), backslash ( \ ), or newline character. A wide string literal may contain the escape sequences listed above and any universal character name.

What is the difference between character and string literal?

Character literals represents alphabets (both cases), numbers (0 to 9), special characters (@, ?, & etc.) and escape sequences like \n, \b etc. Whereas, the String literal represents objects of String class.


1 Answers

There are several ways to do this, depending on the available version of the C++ standard. If you have C++17 available, you can scroll down to Method 3, which is the most elegant solution in my opinion.

Note: Methods 1 and 3 assume that the characters of the string literal will be restricted to 7-bit ASCII. This requires that characters are in the range [0..127] and the execution character set is compatible with 7-bit ASCII (e. g. Windows-1252 or UTF-8). Otherwise the simple casting of char values to wchar_t used by these methods won't give the correct result.

Method 1 - aggregate initialization (C++03)

The simplest way is to define an array using aggregate initialization:

template<typename CharType>
class StringTraits {
public:
    static const CharType NULL_CHAR = '\0';
    static constexpr CharType WHITESPACE_STR[] = {'a','b','c',0};
};

Method 2 - template specialization and macro (C++03)

(Another variant is shown in this answer.)

The aggregate initialization method can be cumbersome for long strings. For more comfort, we can use a combination of template specialization and macros:

template< typename CharT > constexpr CharT const* NarrowOrWide( char const*, wchar_t const* );
template<> constexpr char const* NarrowOrWide< char >( char const* c, wchar_t const* )       
    { return c; }
template<> constexpr wchar_t const* NarrowOrWide< wchar_t >( char const*, wchar_t const* w ) 
    { return w; }

#define TOWSTRING1(x) L##x
#define TOWSTRING(x) TOWSTRING1(x)  
#define NARROW_OR_WIDE( C, STR ) NarrowOrWide< C >( ( STR ), TOWSTRING( STR ) )

Usage:

template<typename CharType>
class StringTraits {
public:
    static constexpr CharType const* WHITESPACE_STR = NARROW_OR_WIDE( CharType, " " );
};

Live Demo at Coliru

Explanation:

The template function NarrowOrWide() returns either the first (char const*) or the second (wchar_t const*) argument, depending on template parameter CharT.

The macro NARROW_OR_WIDE is used to avoid having to write both the narrow and the wide string literal. The macro TOWSTRING simply prepends the L prefix to the given string literal.

Of course the macro will only work if the range of characters is limited to basic ASCII, but this is usually sufficient. Otherwise one can use the NarrowOrWide() template function to define narrow and wide string literals separately.

Notes:

I would add a "unique" prefix to the macro names, something like the name of your library, to avoid conflicts with similar macros defined elsewhere.


Method 3 - array initialized via template parameter pack (C++17)

C++17 finally allows us to get rid of the macro and use a pure C++ solution. The solution uses template parameter pack expansion to initialize an array from a string literal while static_casting the individual characters to the desired type.

First we declare a str_array class, which is similar to std::array but tailored for constant null-terminated string (e. g. str_array::size() returns number of characters without '\0', instead of buffer size). This wrapper class is necessary, because a plain array cannot be returned from a function. It must be wrapped in a struct or class.

template< typename CharT, std::size_t Length >
struct str_array
{
    constexpr CharT const* c_str()              const { return data_; }
    constexpr CharT const* data()               const { return data_; }
    constexpr CharT operator[]( std::size_t i ) const { return data_[ i ]; }
    constexpr CharT const* begin()              const { return data_; }
    constexpr CharT const* end()                const { return data_ + Length; }
    constexpr std::size_t size()                const { return Length; }
    // TODO: add more members of std::basic_string

    CharT data_[ Length + 1 ];  // +1 for null-terminator
};

So far, nothing special. The real trickery is done by the following str_array_cast() function, which initializes the str_array from a string literal while static_casting the individual characters to the desired type:

#include <utility>

namespace detail {
    template< typename ResT, typename SrcT >
    constexpr ResT static_cast_ascii( SrcT x )
    {
        if( !( x >= 0 && x <= 127 ) )
            throw std::out_of_range( "Character value must be in basic ASCII range (0..127)" );
        return static_cast<ResT>( x );
    }
    
    template< typename ResElemT, typename SrcElemT, std::size_t N, std::size_t... I >
    constexpr str_array< ResElemT, N - 1 > do_str_array_cast( const SrcElemT(&a)[N], std::index_sequence<I...> )
    {
        return { static_cast_ascii<ResElemT>( a[I] )..., 0 };
    }
} //namespace detail

template< typename ResElemT, typename SrcElemT, std::size_t N, typename Indices = std::make_index_sequence< N - 1 > >
constexpr str_array< ResElemT, N - 1 > str_array_cast( const SrcElemT(&a)[N] )
{
    return detail::do_str_array_cast< ResElemT >( a, Indices{} );
}

The template parameter pack expansion trickery is required, because constant arrays can only be initialized via aggregate initialization (e. g. const str_array<char,3> = {'a','b','c',0};), so we have to "convert" the string literal to such an initializer list.

The code triggers a compile time error if any character is outside of basic ASCII range (0..127), for the reasons given at the beginning of this answer. There are code pages where 0..127 doesn't map to ASCII, so this check does not give 100% safety though.

Usage:

template< typename CharT >
struct StringTraits
{
    static constexpr auto WHITESPACE_STR = str_array_cast<CharT>( "abc" );
    
    // Fails to compile (as intended), because characters are not basic ASCII.
    //static constexpr auto WHITESPACE_STR1 = str_array_cast<CharT>( "äöü" );
};

Live Demo at Coliru

like image 196
zett42 Avatar answered Nov 10 '22 00:11

zett42