Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my BSTR to std::wstring conversion so slow? Is my tester bad?

I often need to convert BSTR strings to std::wstring. A NULL BSTR counts as an empty BSTR.

I used to do it like this:

#define CHECKNULLSTR(str) ((str) ? (str) : L"")
std::wstring wstr(CHECKNULLSTR(bstr));

It doesn't handle internal '\0' chars, but it also needs to count the characters before it can allocate enough memory, so it should be slow. I thought of this optimization, which should handle every case, doesn't truncate, and doesn't need to count:

std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));

To test the impact of this change, I wrote the following tester. It shows that the optimization takes more than twice as long in most cases. The change is observable in both Debug and Release configurations, and I'm using VC++ 2013.

Hence my question, what is going on here? How can the "pair of pointers" iterator constructor be so much slower than the C-String constructor?

Complete tester

#include <windows.h>
#include <stdio.h>
#include <tchar.h>
#include <strsafe.h>
#include <iostream>

#define CHECKNULLSTR(str) ((str) ? (str) : L"")

ULONGLONG bstrAllocTest(UINT iterations = 10000)
{
    ULONGLONG totallen = 0;
    ULONGLONG start, stop, elapsed1, elapsed2;    
    BSTR bstr = ::SysAllocString( // 15 * 50 = 750 chars
                     L"01234567890123456789012345678901234567890123456789" //  1
                     L"01234567890123456789012345678901234567890123456789" //  2
                     L"01234567890123456789012345678901234567890123456789" //  3
                     L"01234567890123456789012345678901234567890123456789" //  4
                     L"01234567890123456789012345678901234567890123456789" //  5
                     L"01234567890123456789012345678901234567890123456789" //  6
                     L"01234567890123456789012345678901234567890123456789" //  7
                     L"01234567890123456789012345678901234567890123456789" //  8
                     L"01234567890123456789012345678901234567890123456789" //  9
                     L"01234567890123456789012345678901234567890123456789" // 10
                     L"01234567890123456789012345678901234567890123456789" // 11
                     L"01234567890123456789012345678901234567890123456789" // 12
                     L"01234567890123456789012345678901234567890123456789" // 13
                     L"01234567890123456789012345678901234567890123456789" // 14
                     L"01234567890123456789012345678901234567890123456789" // 15
                                );

    start = ::GetTickCount64();
    for (UINT i = 1; i <= iterations; ++i)
    {
        std::wstring wstr(CHECKNULLSTR(bstr));
        size_t len;
        ::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
        totallen += len;
    }
    stop = ::GetTickCount64();
    elapsed1 = stop - start;

    start = ::GetTickCount64();
    for (UINT i = 1; i <= iterations; ++i)
    {
        std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));
        size_t len;
        ::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
        totallen += len;
    }
    stop = ::GetTickCount64();
    elapsed2 = stop - start;

    wprintf_s(L"Iter:\t%u\n"
              L"Elapsed (CHECKNULLSTR):\t%10llu ms\n"
              L"Elapsed (Ptr iter pair):\t%10llu ms\n"
              L"Speed difference:\t%f %%\n",
              iterations,
              elapsed1,
              elapsed2,
              (static_cast<double>(elapsed2) / elapsed1 * 100));

    ::SysFreeString(bstr);
    return totallen;
}

int wmain(int argc, char* argv[])
{
    ULONGLONG dummylen = bstrAllocTest(100 * 1000);
    wprintf_s(L"\nTotal length:\t%llu", dummylen);
    getchar();
    return 0;
}

Output on my system

Iter:   100000
Elapsed (CHECKNULLSTR):        296 ms
Elapsed (Ptr it pair):         577 ms
Speed difference:       194.932432 %

Total length:   150000000
like image 259
Felix Dombek Avatar asked Apr 10 '15 15:04

Felix Dombek


1 Answers

Interesting and a bit surprising indeed. The difference in performance for Visual C++ 2013 Update 4 is down to the way the two std::wstring constructors are implemented in its standard library. Generally speaking, the constructor taking a pair of iterators has to handle more cases, as those iterators are not necessarily pointers, and they can point to other data types than the string's character type (the character type just needs to be constructible from the type pointed to by the iterators). However, I was expecting the implementation to handle your case separately with optimized code.

std::wstring wstr(CHECKNULLSTR(bstr)); indeed scans the string for the end 0, then allocates, then copies the string data over in the fastest possible way using memcpy, which is implemented using assembly code.

std::wstring wstr(bstr, bstr + ::SysStringLen(bstr)); indeed avoids the scan because of ::SysStringLen (which is very fast, just reads the stored length), then allocates, but then copies the string data over using the following loop:

for (; _First != _Last; ++_First)
   append((size_type)1, (_Elem)*_First);

VC12 decides not to inline the append call (understandably so, the body is pretty big), and all this, as you can imagine, carries quite a bit of overhead compared to a blazing memcpy.


One solution is to use the std::basic_string constructor that takes a pointer and a count (also mentioned by Ben Voigt in his comment), like this:

std::wstring wstr(CHECKNULLSTR(bstr), ::SysStringLen(bstr));

I've just tested it, and it does bring the expected benefits on Visual C++ 2013 - it sometimes takes just half the time of the first version, and about 75% in the worst case (these are approximate measurements anyway).


The standard library implementation in Visual C++ 2015 CTP6 has an optimized code path for the constructor taking an iterator pair when the iterators are actually pointers to the same character type as the string to be constructed, resulting in essentially the same code as the pointer-and-count variant above. So, on this version, it doesn't matter which of these two constructor variants you use for your case - they're both faster than the version taking only a pointer.

like image 191
bogdan Avatar answered Nov 14 '22 21:11

bogdan