I often need to convert BSTR
strings to std::wstring
. A NULL
BSTR
counts as an empty BSTR
.
I used to do it like this:
#define CHECKNULLSTR(str) ((str) ? (str) : L"")
std::wstring wstr(CHECKNULLSTR(bstr));
It doesn't handle internal '\0'
chars, but it also needs to count the characters before it can allocate enough memory, so it should be slow. I thought of this optimization, which should handle every case, doesn't truncate, and doesn't need to count:
std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));
To test the impact of this change, I wrote the following tester. It shows that the optimization takes more than twice as long in most cases. The change is observable in both Debug and Release configurations, and I'm using VC++ 2013.
Hence my question, what is going on here? How can the "pair of pointers" iterator constructor be so much slower than the C-String constructor?
#include <windows.h>
#include <stdio.h>
#include <tchar.h>
#include <strsafe.h>
#include <iostream>
#define CHECKNULLSTR(str) ((str) ? (str) : L"")
ULONGLONG bstrAllocTest(UINT iterations = 10000)
{
ULONGLONG totallen = 0;
ULONGLONG start, stop, elapsed1, elapsed2;
BSTR bstr = ::SysAllocString( // 15 * 50 = 750 chars
L"01234567890123456789012345678901234567890123456789" // 1
L"01234567890123456789012345678901234567890123456789" // 2
L"01234567890123456789012345678901234567890123456789" // 3
L"01234567890123456789012345678901234567890123456789" // 4
L"01234567890123456789012345678901234567890123456789" // 5
L"01234567890123456789012345678901234567890123456789" // 6
L"01234567890123456789012345678901234567890123456789" // 7
L"01234567890123456789012345678901234567890123456789" // 8
L"01234567890123456789012345678901234567890123456789" // 9
L"01234567890123456789012345678901234567890123456789" // 10
L"01234567890123456789012345678901234567890123456789" // 11
L"01234567890123456789012345678901234567890123456789" // 12
L"01234567890123456789012345678901234567890123456789" // 13
L"01234567890123456789012345678901234567890123456789" // 14
L"01234567890123456789012345678901234567890123456789" // 15
);
start = ::GetTickCount64();
for (UINT i = 1; i <= iterations; ++i)
{
std::wstring wstr(CHECKNULLSTR(bstr));
size_t len;
::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
totallen += len;
}
stop = ::GetTickCount64();
elapsed1 = stop - start;
start = ::GetTickCount64();
for (UINT i = 1; i <= iterations; ++i)
{
std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));
size_t len;
::StringCchLengthW(wstr.c_str(), STRSAFE_MAX_CCH, &len);
totallen += len;
}
stop = ::GetTickCount64();
elapsed2 = stop - start;
wprintf_s(L"Iter:\t%u\n"
L"Elapsed (CHECKNULLSTR):\t%10llu ms\n"
L"Elapsed (Ptr iter pair):\t%10llu ms\n"
L"Speed difference:\t%f %%\n",
iterations,
elapsed1,
elapsed2,
(static_cast<double>(elapsed2) / elapsed1 * 100));
::SysFreeString(bstr);
return totallen;
}
int wmain(int argc, char* argv[])
{
ULONGLONG dummylen = bstrAllocTest(100 * 1000);
wprintf_s(L"\nTotal length:\t%llu", dummylen);
getchar();
return 0;
}
Iter: 100000
Elapsed (CHECKNULLSTR): 296 ms
Elapsed (Ptr it pair): 577 ms
Speed difference: 194.932432 %
Total length: 150000000
Interesting and a bit surprising indeed. The difference in performance for Visual C++ 2013 Update 4 is down to the way the two std::wstring
constructors are implemented in its standard library. Generally speaking, the constructor taking a pair of iterators has to handle more cases, as those iterators are not necessarily pointers, and they can point to other data types than the string's character type (the character type just needs to be constructible from the type pointed to by the iterators). However, I was expecting the implementation to handle your case separately with optimized code.
std::wstring wstr(CHECKNULLSTR(bstr));
indeed scans the string for the end 0
, then allocates, then copies the string data over in the fastest possible way using memcpy
, which is implemented using assembly code.
std::wstring wstr(bstr, bstr + ::SysStringLen(bstr));
indeed avoids the scan because of ::SysStringLen
(which is very fast, just reads the stored length), then allocates, but then copies the string data over using the following loop:
for (; _First != _Last; ++_First)
append((size_type)1, (_Elem)*_First);
VC12 decides not to inline the append
call (understandably so, the body is pretty big), and all this, as you can imagine, carries quite a bit of overhead compared to a blazing memcpy
.
One solution is to use the std::basic_string
constructor that takes a pointer and a count (also mentioned by Ben Voigt in his comment), like this:
std::wstring wstr(CHECKNULLSTR(bstr), ::SysStringLen(bstr));
I've just tested it, and it does bring the expected benefits on Visual C++ 2013 - it sometimes takes just half the time of the first version, and about 75% in the worst case (these are approximate measurements anyway).
The standard library implementation in Visual C++ 2015 CTP6 has an optimized code path for the constructor taking an iterator pair when the iterators are actually pointers to the same character type as the string to be constructed, resulting in essentially the same code as the pointer-and-count variant above. So, on this version, it doesn't matter which of these two constructor variants you use for your case - they're both faster than the version taking only a pointer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With