I've done a few years of c# now, and I'm trying to learn some new stuff. So I decided to have a look at c++, to get to know programming in a different way. I've been doing loads of reading, but I just started writing some code today. On my Windows 7/64 bit machine, running VS2010, I created two projects: 1) A c# project that lets me write things the way I'm used to. 2) A c++ "makefile" project that let's me play around, trying to implement the same thing. From what I understand, this ISN'T a .NET project. I got to trying to populate a dictionary with 10K values. For some reason, the c++ is orders of magnitude slower. Here's the c# below. Note I put in a function after the time measurement to ensure it wasn't "optimized" away by the compiler: <pre class="prettyprint"><code>var freq = System.Diagnostics.Stopwatch.Frequency; int i; Dictionary<int, int> dict = new Dictionary<int, int>(); var clock = System.Diagnostics.Stopwatch.StartNew(); for (i = 0; i < 10000; i++) dict[i] = i; clock.Stop(); Console.WriteLine(clock.ElapsedTicks / (decimal)freq * 1000M); Console.WriteLine(dict.Average(x=>x.Value)); Console.ReadKey(); //Don't want results to vanish off screen </code></pre> Here's the c++, not much thought has gone into it (trying to learn, right?) int input; <pre class="prettyprint"><code>LARGE_INTEGER frequency; // ticks per second LARGE_INTEGER t1, t2; // ticks double elapsedTime; // get ticks per second QueryPerformanceFrequency(&frequency); int i; boost::unordered_map<int, int> dict; // start timer QueryPerformanceCounter(&t1); for (i=0;i<10000;i++) dict[i]=i; // stop timer QueryPerformanceCounter(&t2); // compute and print the elapsed time in millisec elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart; cout << elapsedTime << " ms insert time\n"; int input; cin >> input; //don't want console to disappear </code></pre> Now, some caveats. I managed to find this related SO question. One of the guys wrote a long answer mentioning WOW64 skewing the results. I've set the project to release and gone through the "properties" tab of the c++ project, enabling everything that sounded like it would make it fast. Changed the platform to x64, though I'm not sure whether that addresses his wow64 issue. I'm not that experienced with the compiler options, perhaps you guys have more of a clue? Oh, and the results: c#:0.32ms c++: 8.26ms. This is a bit strange. Have I misinterpreted something about what .Quad means? I copied the c++ timer code from someplace on the web, going through all the boost installation and include/libfile rigmarole. Or perhaps I am actually using different instruments unwittingly? Or there's some critical compile option that I haven't used? Or maybe the c# code is optimized because the average is a constant? Here's the c++ command line, from the Property page->C/C++->Command Line: /I"C:\Users\Carlos\Desktop\boost_1_47_0" /Zi /nologo /W3 /WX- /MP /Ox /Oi /Ot /GL /D "_MBCS" /Gm- /EHsc /GS- /Gy- /arch:SSE2 /fp:fast /Zc:wchar_t /Zc:forScope /Fp"x64\Release\MakeTest.pch" /Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gd /errorReport:queue Any help would be appreciated, thanks.

A simple allocator change will cut that time down a lot. <pre class="prettyprint"><code>boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, boost::fast_pool_allocator<std::pair<const int, int>>> dict; </code></pre> 0.9ms on my system (from 10ms before). This suggests to me that actually, the vast, vast majority of your time is not spent in the hash table at all, but in the allocator. The reason that this is an unfair comparison is because your GC will never collect in such a trivial program, giving it an undue performance advantage, and native allocators do significant caching of free memory- but that'll never come into play in such a trivial example, because you've never allocated or deallocated anything and so there's nothing to cache. Finally, the Boost pool implementation is thread-safe, whereas you never play with threads so the GC can just fall back to a single-threaded implementation, which will be much faster. I resorted to a hand-rolled, non-freeing non-thread-safe pool allocator and got down to 0.525ms for C++ to 0.45ms for C# (on my machine). Conclusion: Your original results were very skewed because of the different memory allocation schemes of the two languages, and once that was resolved, then the difference becomes relatively minimal. A custom hasher (as described in Alexandre's answer) dropped my C++ time to 0.34ms, which is now faster than C#. <pre class="prettyprint"><code>static const int MaxMemorySize = 800000; static int FreedMemory = 0; static int AllocatorCalls = 0; static int DeallocatorCalls = 0; template <typename T> class LocalAllocator { public: std::vector<char>* memory; int* CurrentUsed; typedef T value_type; typedef value_type * pointer; typedef const value_type * const_pointer; typedef value_type & reference; typedef const value_type & const_reference; typedef std::size_t size_type; typedef std::size_t difference_type; template <typename U> struct rebind { typedef LocalAllocator other; }; template <typename U> LocalAllocator(const LocalAllocator& other) { CurrentUsed = other.CurrentUsed; memory = other.memory; } LocalAllocator(std::vector<char>* ptr, int* used) { CurrentUsed = used; memory = ptr; } template<typename U> LocalAllocator(LocalAllocator&& other) { CurrentUsed = other.CurrentUsed; memory = other.memory; } pointer address(reference r) { return &r; } const_pointer address(const_reference s) { return &r; } size_type max_size() const { return MaxMemorySize; } void construct(pointer ptr, value_type&& t) { new (ptr) T(std::move(t)); } void construct(pointer ptr, const value_type & t) { new (ptr) T(t); } void destroy(pointer ptr) { static_cast<T*>(ptr)->~T(); } bool operator==(const LocalAllocator& other) const { return Memory == other.Memory; } bool operator!=(const LocalAllocator&) const { return false; } pointer allocate(size_type count) { AllocatorCalls++; if (*CurrentUsed + (count * sizeof(T)) > MaxMemorySize) throw std::bad_alloc(); if (*CurrentUsed % std::alignment_of<T>::value) { *CurrentUsed += (std::alignment_of<T>::value - *CurrentUsed % std::alignment_of<T>::value); } auto val = &((*memory)[*CurrentUsed]); *CurrentUsed += (count * sizeof(T)); return reinterpret_cast<pointer>(val); } void deallocate(pointer ptr, size_type n) { DeallocatorCalls++; FreedMemory += (n * sizeof(T)); } pointer allocate() { return allocate(sizeof(T)); } void deallocate(pointer ptr) { return deallocate(ptr, 1); } }; int main() { LARGE_INTEGER frequency; // ticks per second LARGE_INTEGER t1, t2; // ticks double elapsedTime; // get ticks per second QueryPerformanceFrequency(&frequency); std::vector<char> memory; int CurrentUsed = 0; memory.resize(MaxMemorySize); struct custom_hash { size_t operator()(int x) const { return x; } }; boost::unordered_map<int, int, custom_hash, std::equal_to<int>, LocalAllocator<std::pair<const int, int>>> dict( std::unordered_map<int, int>().bucket_count(), custom_hash(), std::equal_to<int>(), LocalAllocator<std::pair<const int, int>>(&memory, &CurrentUsed) ); // start timer std::string str; QueryPerformanceCounter(&t1); for (int i=0;i<10000;i++) dict[i]=i; // stop timer QueryPerformanceCounter(&t2); // compute and print the elapsed time in millisec elapsedTime = ((t2.QuadPart - t1.QuadPart) * 1000.0) / frequency.QuadPart; std::cout << elapsedTime << " ms insert time\n"; int input; std::cin >> input; //don't want console to disappear } </code></pre>

Storing a consecutive sequence of numeric integral keys added in ascending order is definitely NOT what hash tables are optimized for. Use an array, or else generate random values. And do some retrievals. Hash tables are highly optimized for retrieval.

c# to c++ dictionary to unordered_map results

Tags:

c++

performance

c#

collections

visual-studio

I've done a few years of c# now, and I'm trying to learn some new stuff. So I decided to have a look at c++, to get to know programming in a different way.

I've been doing loads of reading, but I just started writing some code today.

On my Windows 7/64 bit machine, running VS2010, I created two projects: 1) A c# project that lets me write things the way I'm used to. 2) A c++ "makefile" project that let's me play around, trying to implement the same thing. From what I understand, this ISN'T a .NET project.

I got to trying to populate a dictionary with 10K values. For some reason, the c++ is orders of magnitude slower.

Here's the c# below. Note I put in a function after the time measurement to ensure it wasn't "optimized" away by the compiler:

var freq = System.Diagnostics.Stopwatch.Frequency;

int i;
Dictionary<int, int> dict = new Dictionary<int, int>();
var clock = System.Diagnostics.Stopwatch.StartNew();

for (i = 0; i < 10000; i++)
     dict[i] = i;
clock.Stop();

Console.WriteLine(clock.ElapsedTicks / (decimal)freq * 1000M);
Console.WriteLine(dict.Average(x=>x.Value));
Console.ReadKey(); //Don't want results to vanish off screen

Here's the c++, not much thought has gone into it (trying to learn, right?) int input;

LARGE_INTEGER frequency;        // ticks per second
LARGE_INTEGER t1, t2;           // ticks
double elapsedTime;

// get ticks per second
QueryPerformanceFrequency(&frequency);

int i;
boost::unordered_map<int, int> dict;
// start timer
QueryPerformanceCounter(&t1);

for (i=0;i<10000;i++)
    dict[i]=i;

// stop timer
QueryPerformanceCounter(&t2);

// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms insert time\n";
int input;
cin >> input; //don't want console to disappear

Now, some caveats. I managed to find this related SO question. One of the guys wrote a long answer mentioning WOW64 skewing the results. I've set the project to release and gone through the "properties" tab of the c++ project, enabling everything that sounded like it would make it fast. Changed the platform to x64, though I'm not sure whether that addresses his wow64 issue. I'm not that experienced with the compiler options, perhaps you guys have more of a clue?

Oh, and the results: c#:0.32ms c++: 8.26ms. This is a bit strange. Have I misinterpreted something about what .Quad means? I copied the c++ timer code from someplace on the web, going through all the boost installation and include/libfile rigmarole. Or perhaps I am actually using different instruments unwittingly? Or there's some critical compile option that I haven't used? Or maybe the c# code is optimized because the average is a constant?

Here's the c++ command line, from the Property page->C/C++->Command Line: /I"C:\Users\Carlos\Desktop\boost_1_47_0" /Zi /nologo /W3 /WX- /MP /Ox /Oi /Ot /GL /D "_MBCS" /Gm- /EHsc /GS- /Gy- /arch:SSE2 /fp:fast /Zc:wchar_t /Zc:forScope /Fp"x64\Release\MakeTest.pch" /Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gd /errorReport:queue

Any help would be appreciated, thanks.

328

asked Aug 07 '11 18:08

Carlos

2 Answers

A simple allocator change will cut that time down a lot.

boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, boost::fast_pool_allocator<std::pair<const int, int>>> dict;

0.9ms on my system (from 10ms before). This suggests to me that actually, the vast, vast majority of your time is not spent in the hash table at all, but in the allocator. The reason that this is an unfair comparison is because your GC will never collect in such a trivial program, giving it an undue performance advantage, and native allocators do significant caching of free memory- but that'll never come into play in such a trivial example, because you've never allocated or deallocated anything and so there's nothing to cache.

Finally, the Boost pool implementation is thread-safe, whereas you never play with threads so the GC can just fall back to a single-threaded implementation, which will be much faster.

I resorted to a hand-rolled, non-freeing non-thread-safe pool allocator and got down to 0.525ms for C++ to 0.45ms for C# (on my machine). Conclusion: Your original results were very skewed because of the different memory allocation schemes of the two languages, and once that was resolved, then the difference becomes relatively minimal.

A custom hasher (as described in Alexandre's answer) dropped my C++ time to 0.34ms, which is now faster than C#.

static const int MaxMemorySize = 800000;
static int FreedMemory = 0;
static int AllocatorCalls = 0;
static int DeallocatorCalls = 0;
template <typename T>
class LocalAllocator
{
  public:
      std::vector<char>* memory;
      int* CurrentUsed;
      typedef T value_type;
      typedef value_type * pointer;
      typedef const value_type * const_pointer;
      typedef value_type & reference;
      typedef const value_type & const_reference;
      typedef std::size_t size_type;
      typedef std::size_t difference_type;

    template <typename U> struct rebind { typedef LocalAllocator<U> other; };

    template <typename U>
    LocalAllocator(const LocalAllocator<U>& other) {
        CurrentUsed = other.CurrentUsed;
        memory = other.memory;
    }
    LocalAllocator(std::vector<char>* ptr, int* used) {
        CurrentUsed = used;
        memory = ptr;
    }
    template<typename U> LocalAllocator(LocalAllocator<U>&& other) {
        CurrentUsed = other.CurrentUsed;
        memory = other.memory;
    }
    pointer address(reference r) { return &r; }
    const_pointer address(const_reference s) { return &r; }
    size_type max_size() const { return MaxMemorySize; }
    void construct(pointer ptr, value_type&& t) { new (ptr) T(std::move(t)); }
    void construct(pointer ptr, const value_type & t) { new (ptr) T(t); }
    void destroy(pointer ptr) { static_cast<T*>(ptr)->~T(); }

    bool operator==(const LocalAllocator& other) const { return Memory == other.Memory; }
    bool operator!=(const LocalAllocator&) const { return false; }

    pointer allocate(size_type count) {
        AllocatorCalls++;
        if (*CurrentUsed + (count * sizeof(T)) > MaxMemorySize)
            throw std::bad_alloc();
        if (*CurrentUsed % std::alignment_of<T>::value) {
            *CurrentUsed += (std::alignment_of<T>::value - *CurrentUsed % std::alignment_of<T>::value);
        }
        auto val = &((*memory)[*CurrentUsed]);
        *CurrentUsed += (count * sizeof(T));
        return reinterpret_cast<pointer>(val);
    }
    void deallocate(pointer ptr, size_type n) {
        DeallocatorCalls++;
        FreedMemory += (n * sizeof(T));
    }

    pointer allocate() {
        return allocate(sizeof(T));
    }
    void deallocate(pointer ptr) {
        return deallocate(ptr, 1);
    }
};
int main() {
    LARGE_INTEGER frequency;        // ticks per second
    LARGE_INTEGER t1, t2;           // ticks
    double elapsedTime;

    // get ticks per second
    QueryPerformanceFrequency(&frequency);
    std::vector<char> memory;
    int CurrentUsed = 0;
    memory.resize(MaxMemorySize);

    struct custom_hash {
        size_t operator()(int x) const { return x; }
    };
    boost::unordered_map<int, int, custom_hash, std::equal_to<int>, LocalAllocator<std::pair<const int, int>>> dict(
        std::unordered_map<int, int>().bucket_count(),
        custom_hash(),
        std::equal_to<int>(),
        LocalAllocator<std::pair<const int, int>>(&memory, &CurrentUsed)
    );

    // start timer
    std::string str;
    QueryPerformanceCounter(&t1);

    for (int i=0;i<10000;i++)
        dict[i]=i;

    // stop timer
    QueryPerformanceCounter(&t2);

    // compute and print the elapsed time in millisec
    elapsedTime = ((t2.QuadPart - t1.QuadPart) * 1000.0) / frequency.QuadPart;
    std::cout << elapsedTime << " ms insert time\n";
    int input;
    std::cin >> input; //don't want console to disappear
}

answered Oct 30 '22 16:10

Puppy

Storing a consecutive sequence of numeric integral keys added in ascending order is definitely NOT what hash tables are optimized for.

Use an array, or else generate random values.

And do some retrievals. Hash tables are highly optimized for retrieval.

answered Oct 30 '22 18:10

Ben Voigt

Related questions
                            
                                ViewData and ViewModel in MVC ASP.NET
                            
                                Simple Linq expression won't compile
                            
                                What is the recommended identity generation approach in Entity framework?
                            
                                Abstract method declaration - virtual?
                            
                                Pass complex object with redirect in ASP.NET MVC?
                            
                                Does WCF support WS-Security with SOAP 1.1?
                            
                                Lucene.Net writing/reading synchronization
                            
                                Protection level of a struct field within a class
                            
                                Should I compress in-memory C# objects for better performance?
                            
                                NHibernate SaveOrUpdate Child Collection Not Updated With Identity Ids
                            
                                Look if a method is called inside a method using reflection
                            
                                Bitconverter for Java
                            
                                WPF Minimize on Taskbar Click
                            
                                Best way to create a plugin environment in .NET
                            
                                Open a text file with WPF
                            
                                C# Reading the registry: ProductID returns null in x86 targeted app. "Any CPU" works fine
                            
                                Increasing the Lifetime element for EWS Streaming Subscription Connection
                            
                                Iterating through an Excel range
                            
                                IDataReader - Any way to get the total rows?
                            
                                Using FileSystemWatcher with multiple files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With