utf8 utf16: codecvt poor performance

Question

I'm looking onto some of my old (and exclusively win32 oriented) stuff and thinking about making it more modern/portable - i.e. reimplementing some widely reusable parts in C++11. One of these parts is convertin between utf8 and utf16. In Win32 API I'm using MultiByteToWideChar/WideCharToMultiByte, trying to port that stuff to C++11 using sample code from here: https://stackoverflow.com/a/14809553. The result is

Release build (compiled by MSVS 2013, run on Core i7 3610QM)

stdlib                   = 1587.2 ms
Win32                    =  127.2 ms

Debug build

stdlib                   = 5733.8 ms
Win32                    =  127.2 ms

The question is - is there something wrong with the code? If everything seems to be OK - is there some good reason for the such performance difference?

Test code is below:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <clocale>  
#include <codecvt> 

#define XU_BEGIN_TIMER(NAME)                       \
    {                                           \
        LARGE_INTEGER   __freq;                 \
        LARGE_INTEGER   __t0;                   \
        LARGE_INTEGER   __t1;                   \
        double          __tms;                  \
        const char*     __tname = NAME;         \
        char            __tbuf[0xff];           \
                                                \
        QueryPerformanceFrequency(&__freq);     \
        QueryPerformanceCounter(&__t0);         

#define XU_END_TIMER()                             \
        QueryPerformanceCounter(&__t1);         \
        __tms = (__t1.QuadPart - __t0.QuadPart) * 1000.0 / __freq.QuadPart; \
        sprintf_s(__tbuf, sizeof(__tbuf), "    %-24s = %6.1f ms
", __tname, __tms ); \
        OutputDebugStringA(__tbuf);             \
        printf(__tbuf);                         \
    }   

std::string read_utf8() {
    std::ifstream infile("C:/temp/UTF-8-demo.txt");
    std::string fileData((std::istreambuf_iterator<char>(infile)),
                         std::istreambuf_iterator<char>());
    infile.close();

    return fileData;
}

void testMethod() {
    std::setlocale(LC_ALL, "en_US.UTF-8");
    std::string source = read_utf8();
    {
        std::string utf8;

        XU_BEGIN_TIMER("stdlib") {
            for( int i = 0; i < 1000; i++ ) {
                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf16;
                std::u16string utf16 = convert2utf16.from_bytes(source);

                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf8;
                utf8 = convert2utf8.to_bytes(utf16);
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\temp\utf8-std.dat", "wb");
        fwrite(utf8.c_str(), 1, utf8.length(), output);
        fclose(output);
    }

    char* utf8 = NULL;
    int cchA = 0;

    {
        XU_BEGIN_TIMER("Win32") {
            for( int i = 0; i < 1000; i++ ) {
                WCHAR* utf16 = new WCHAR[source.length() + 1];
                int cchW;
                utf8 = new char[source.length() + 1];

                cchW = MultiByteToWideChar(
                    CP_UTF8, 0, source.c_str(), source.length(),
                    utf16, source.length() + 1);

                cchA = WideCharToMultiByte(
                    CP_UTF8, 0, utf16, cchW,
                    utf8, source.length() + 1, NULL, false);

                delete[] utf16;
                if( i != 999 )
                    delete[] utf8;
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\temp\utf8-win.dat", "wb");
        fwrite(utf8, 1, cchA, output);
        fclose(output);

        delete[] utf8;
    }
}

James Davies · Accepted Answer

In my own testing, I found that the constructor call for wstring_convert has a massive overhead, at least on Windows. As other answers suggest, you'll probably struggle to beat the native Windows implementation, but try modifying your code to construct the converter outside of the loop. I expect you'll see an improvement of between 5x and 20x, particularly in a debug build.

utf8 <-> utf16: codecvt poor performance

Tags:

c++

performance

c++11

utf-8

Xtra Coder

Video Answer

1 Answers

James Davies

Recent Activity

Donate For Us