I'm looking onto some of my old (and exclusively win32 oriented) stuff and thinking about making it more modern/portable - i.e. reimplementing some widely reusable parts in C++11. One of these parts is convertin between utf8 and utf16. In Win32 API I'm using MultiByteToWideChar
/WideCharToMultiByte
, trying to port that stuff to C++11 using sample code from here: https://stackoverflow.com/a/14809553. The result is
Release build (compiled by MSVS 2013, run on Core i7 3610QM)
stdlib = 1587.2 ms
Win32 = 127.2 ms
Debug build
stdlib = 5733.8 ms
Win32 = 127.2 ms
The question is - is there something wrong with the code? If everything seems to be OK - is there some good reason for the such performance difference?
Test code is below:
#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <clocale>
#include <codecvt>
#define XU_BEGIN_TIMER(NAME) \
{ \
LARGE_INTEGER __freq; \
LARGE_INTEGER __t0; \
LARGE_INTEGER __t1; \
double __tms; \
const char* __tname = NAME; \
char __tbuf[0xff]; \
\
QueryPerformanceFrequency(&__freq); \
QueryPerformanceCounter(&__t0);
#define XU_END_TIMER() \
QueryPerformanceCounter(&__t1); \
__tms = (__t1.QuadPart - __t0.QuadPart) * 1000.0 / __freq.QuadPart; \
sprintf_s(__tbuf, sizeof(__tbuf), " %-24s = %6.1f ms\n", __tname, __tms ); \
OutputDebugStringA(__tbuf); \
printf(__tbuf); \
}
std::string read_utf8() {
std::ifstream infile("C:/temp/UTF-8-demo.txt");
std::string fileData((std::istreambuf_iterator<char>(infile)),
std::istreambuf_iterator<char>());
infile.close();
return fileData;
}
void testMethod() {
std::setlocale(LC_ALL, "en_US.UTF-8");
std::string source = read_utf8();
{
std::string utf8;
XU_BEGIN_TIMER("stdlib") {
for( int i = 0; i < 1000; i++ ) {
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf16;
std::u16string utf16 = convert2utf16.from_bytes(source);
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf8;
utf8 = convert2utf8.to_bytes(utf16);
}
} XU_END_TIMER();
FILE* output = fopen("c:\\temp\\utf8-std.dat", "wb");
fwrite(utf8.c_str(), 1, utf8.length(), output);
fclose(output);
}
char* utf8 = NULL;
int cchA = 0;
{
XU_BEGIN_TIMER("Win32") {
for( int i = 0; i < 1000; i++ ) {
WCHAR* utf16 = new WCHAR[source.length() + 1];
int cchW;
utf8 = new char[source.length() + 1];
cchW = MultiByteToWideChar(
CP_UTF8, 0, source.c_str(), source.length(),
utf16, source.length() + 1);
cchA = WideCharToMultiByte(
CP_UTF8, 0, utf16, cchW,
utf8, source.length() + 1, NULL, false);
delete[] utf16;
if( i != 999 )
delete[] utf8;
}
} XU_END_TIMER();
FILE* output = fopen("c:\\temp\\utf8-win.dat", "wb");
fwrite(utf8, 1, cchA, output);
fclose(output);
delete[] utf8;
}
}
In my own testing, I found that the constructor call for wstring_convert
has a massive overhead, at least on Windows. As other answers suggest, you'll probably struggle to beat the native Windows implementation, but try modifying your code to construct the converter outside of the loop. I expect you'll see an improvement of between 5x and 20x, particularly in a debug build.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With