I have been reading and experimenting with the standard library's smart pointers, <code>unique_ptr</code> and <code>shared_ptr</code> and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures. To experiment, I have written an example of a hash map which uses <code>shared_ptr</code> - which according to the Meyer's Effective Modern C++ are roughly double the size of a <code>unique_ptr</code>. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying. Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers? <pre class="prettyprint"><code>#pragma once #include "core.h" const int TABLE_SIZE = 256; template<typename K> class HashKey { public: unsigned long operator()(const K& p_key) const { return (p_key) % TABLE_SIZE; } }; template<typename K, typename T> class HashNode { public: K m_key; T m_value; std::shared_ptr<HashNode> next = nullptr; }; template<typename K, typename T, typename F = HashKey<K>> class HashMap { public: std::array< std::shared_ptr< HashNode<K, T> >, 128 > m_table; F m_hash_function; int m_elem_count{ 0 }; void Add(K p_key, T p_value); }; template<typename K, typename T, typename F = HashKey<K>> void HashMap<K, T, F>::Add(K p_key, T p_value) { unsigned long key = m_hash_function(p_key); std::shared_ptr<HashNode<K, T>> new_node = std::make_shared<HashNode<K, T>>(); new_node->m_key = p_key; new_node->m_value = p_value; if (m_table[key] == nullptr) { /* only item in the bucket */ m_table[key] = std::move(new_node); m_elem_count++; } else { /* check if item exists so it is replaced */ std::shared_ptr< HashNode<K, T> > current = m_table[key]; std::shared_ptr< HashNode<K, T> > previous = m_table[key]; while (current != nullptr && p_key != current->m_key ) { previous = current; current = current->next; } if (current == nullptr) { previous->next = new_node; //current = new_node; m_elem_count++; } else { current->m_value = p_value; } } } void TestHashMap() { HashMap<int, std::string> hash_map; hash_map.Add(1, "one"); hash_map.Add(2, "does"); hash_map.Add(3, "not"); hash_map.Add(50, "simply"); hash_map.Add(11, "waltz"); hash_map.Add(11, "into"); hash_map.Add(191, "mordor"); std::cout << hash_map.m_elem_count << std::endl; } </code></pre>

The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects. If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an <code>std::reference_wrapper</code> is the appropriate choice. If you need unique ownership (at most one owner of the heap-allocated object), then use <code>std::unique_ptr</code>. It has no additional time/memory overhead. If you need shared ownership (any number of owners of the heap-allocated object), then use <code>std::shared_ptr</code>. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe. There's no need to use <code>std::unique_ptr</code> (in place of a raw pointer) unless you actually need to own the object. Assuming you need to own the object, there's no need to use <code>std::shared_ptr</code> (in place of an <code>std::unique_ptr</code>) unless you actually need shared ownership semantics. <hr> In your case, it looks like you have a maximum of heap nodes in your <code>HashMap</code>. Therefore, I'm assuming that you want the <code>HashMap</code> instance to be the unique owner of the nodes. What type should you use? <pre class="prettyprint"><code>template<typename K, typename T, typename F = HashKey<K>> class HashMap { public: std::array</* ? */, 128 > m_table; // ... }; </code></pre> You have two options: <ol> <li>If you want to store the objects with an heap indirection, use <code>std::unique_ptr</code>, as the unique owner of these heap-allocated object is and will always be the <code>HashMap</code> instance.</li> <li>If you want to store the objects directly into the <code>HashMap</code>, with no heap indirection, then do not use any pointer at all. This could lead to very big <code>HashMap</code> instances. Interface for accessing the <code>next</code> nodes becomes cumbersome.</li> </ol> <hr> Option 1 (store nodes in the heap): This is the most common, and probably best option. <pre class="prettyprint"><code>template<typename K, typename T, typename F = HashKey<K>> class HashMap { public: std::array<std::unique_ptr<HashNode<K, T>>, 128 > m_table; // ... }; </code></pre> This will result in lighter (in terms of memory footprint) <code>HashMap</code> instances. Note: using an <code>std::vector</code> in place of an <code>std::array</code> will reduce the size of <code>HashMap</code> significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the <code>HashMap</code> instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently. There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by <code>HashMap</code>. A raw pointer will be sufficient. <pre class="prettyprint"><code>template<typename K, typename T> class HashNode { public: // ... HashNode* next_ptr = nullptr; auto& next() { assert(next_ptr != nullptr); return *next_ptr; } }; </code></pre> The code above will work fine, assuming that the <code>HashMap</code> is still alive when accessing <code>next</code>. <hr> Option 2 (store nodes in the map instance): <pre class="prettyprint"><code>template<typename K, typename T, typename F = HashKey<K>> class HashMap { public: std::array<HashNode<K, T>, 128 > m_table; // ... }; </code></pre> <code>HashMap</code> instances may be enormous, depending on the size of <code>HashNode<K, T></code>. If you choose to store the nodes directly into the <code>HashMap</code> with no heap indirection, you will have to use an index to access the internal array, as moving/copying the <code>HashMap</code> around will change the memory address of the nodes. <pre class="prettyprint"><code>template<typename K, typename T> class HashNode { public: // ... int next_index = -1; auto& next(HashMap& map) { assert(next_index != -1); return map.m_table[next_index]; } }; </code></pre>

Smart or raw pointers when implementing data structures?

Tags:

c++

pointers

data-structures

I have been reading and experimenting with the standard library's smart pointers, unique_ptr and shared_ptr and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures.

To experiment, I have written an example of a hash map which uses shared_ptr - which according to the Meyer's Effective Modern C++ are roughly double the size of a unique_ptr. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying.

Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers?

Click to copy

#pragma once
#include "core.h"

const int TABLE_SIZE = 256;

template<typename K>
class HashKey {
public:
    unsigned long operator()(const K& p_key) const {
        return (p_key) % TABLE_SIZE;
    }
};

template<typename K, typename T>
class HashNode {
public:
    K m_key;
    T m_value;
    std::shared_ptr<HashNode> next = nullptr;
};


template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array< std::shared_ptr< HashNode<K, T> >, 128 > m_table;
    F m_hash_function;
    int m_elem_count{ 0 };

    void Add(K p_key, T p_value);
};

template<typename K, typename T, typename F = HashKey<K>>
void HashMap<K, T, F>::Add(K p_key, T p_value)
{
    unsigned long key = m_hash_function(p_key);

    std::shared_ptr<HashNode<K, T>> new_node = std::make_shared<HashNode<K, T>>();
    new_node->m_key = p_key;
    new_node->m_value = p_value;


    if (m_table[key] == nullptr) {
        /* only item in the bucket */
        m_table[key] = std::move(new_node);
        m_elem_count++;
    }
    else {
        /* check if item exists so it is replaced */
        std::shared_ptr< HashNode<K, T> > current = m_table[key];
        std::shared_ptr< HashNode<K, T> > previous = m_table[key];
        while (current != nullptr && p_key != current->m_key ) {
            previous = current;
            current = current->next;
        }
        if (current == nullptr) {
            previous->next = new_node;
            //current = new_node;
            m_elem_count++;
        }
        else {
            current->m_value = p_value;
        }

    }
}

void TestHashMap() {

    HashMap<int, std::string> hash_map;

    hash_map.Add(1, "one");
    hash_map.Add(2, "does");
    hash_map.Add(3, "not");
    hash_map.Add(50, "simply");
    hash_map.Add(11, "waltz");
    hash_map.Add(11, "into");
    hash_map.Add(191, "mordor");

    std::cout << hash_map.m_elem_count << std::endl;
}

291

asked Jan 07 '16 16:01

dgouder

1 Answers

The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects.

If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an std::reference_wrapper is the appropriate choice.

If you need unique ownership (at most one owner of the heap-allocated object), then use std::unique_ptr. It has no additional time/memory overhead.

If you need shared ownership (any number of owners of the heap-allocated object), then use std::shared_ptr. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe.

There's no need to use std::unique_ptr (in place of a raw pointer) unless you actually need to own the object.

Assuming you need to own the object, there's no need to use std::shared_ptr (in place of an std::unique_ptr) unless you actually need shared ownership semantics.

In your case, it looks like you have a maximum of heap nodes in your HashMap. Therefore, I'm assuming that you want the HashMap instance to be the unique owner of the nodes.

What type should you use?

Click to copy

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array</* ? */, 128 > m_table;
    // ...
};

You have two options:

If you want to store the objects with an heap indirection, use std::unique_ptr, as the unique owner of these heap-allocated object is and will always be the HashMap instance.
If you want to store the objects directly into the HashMap, with no heap indirection, then do not use any pointer at all. This could lead to very big HashMap instances. Interface for accessing the next nodes becomes cumbersome.

Option 1 (store nodes in the heap):

This is the most common, and probably best option.

Click to copy

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<std::unique_ptr<HashNode<K, T>>, 128 > m_table;
    // ...
};

This will result in lighter (in terms of memory footprint) HashMap instances.

Note: using an std::vector in place of an std::array will reduce the size of HashMap significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the HashMap instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently.

There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by HashMap. A raw pointer will be sufficient.

Click to copy

template<typename K, typename T>
class HashNode {
public:
    // ...
    HashNode* next_ptr = nullptr;
    auto& next() 
    { 
        assert(next_ptr != nullptr);
        return *next_ptr;
    }
};

The code above will work fine, assuming that the HashMap is still alive when accessing next.

Option 2 (store nodes in the map instance):

Click to copy

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<HashNode<K, T>, 128 > m_table;
    // ...
};

HashMap instances may be enormous, depending on the size of HashNode<K, T>.

If you choose to store the nodes directly into the HashMap with no heap indirection, you will have to use an index to access the internal array, as moving/copying the HashMap around will change the memory address of the nodes.

Click to copy

template<typename K, typename T>
class HashNode {
public:
    // ...
    int next_index = -1;
    auto& next(HashMap& map) 
    { 
        assert(next_index != -1);
        return map.m_table[next_index]; 
    }
};

190

answered Oct 12 '22 08:10

Vittorio Romeo

Related questions
                            
                                Any API to prevent Windows 8 from going into connected standby mode?
                            
                                Why may vector.begin() not equal to &vector[0]?
                            
                                One line solution for unused out parameter reference
                            
                                initializer_list immutable nature leads to excessive copying
                            
                                How to use ~/.bash_profile environment variables when using "Run Script" in "Build Phases" for XCode 6.1?
                            
                                Can default function arguments "fill in" for expanded parameter packs?
                            
                                Is it never truly safe to reinterpret_cast input into std::unique_ptr?
                            
                                What is the altered search path (LOAD_WITH_ALTERED_SEARCH_PATH) in LoadLibraryEx()
                            
                                Is there any use for named parameters into template template parameters
                            
                                compiler can't find "aligned_alloc" function
                            
                                what is 'class' in 'class DataType* Variable' in Unreal Engine Shooter Game Sample
                            
                                CMake cannot find a static library using relative file paths
                            
                                "Most important const" with conditional expression?
                            
                                How to parameterize the number of parameters of a constructor?
                            
                                How to check for the existence of a subscript operator?
                            
                                How to get a settings storage path in a cross-platform way in Qt?
                            
                                Do C++11 delegated ctors perform worse than C++03 ctors calling init functions?
                            
                                Does a SFINAEd-out function shadows an explicitly imported overload from the base class
                            
                                What is the use of first template parameter in priority_queue
                            
                                Variables declared by &&

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Smart or raw pointers when implementing data structures?

Tags:

c++

pointers

data-structures

dgouder

People also ask

1 Answers

Vittorio Romeo

Recent Activity

Donate For Us