Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Smart or raw pointers when implementing data structures?

I have been reading and experimenting with the standard library's smart pointers, unique_ptr and shared_ptr and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures.

To experiment, I have written an example of a hash map which uses shared_ptr - which according to the Meyer's Effective Modern C++ are roughly double the size of a unique_ptr. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying.

Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers?

#pragma once
#include "core.h"

const int TABLE_SIZE = 256;

template<typename K>
class HashKey {
public:
    unsigned long operator()(const K& p_key) const {
        return (p_key) % TABLE_SIZE;
    }
};

template<typename K, typename T>
class HashNode {
public:
    K m_key;
    T m_value;
    std::shared_ptr<HashNode> next = nullptr;
};


template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array< std::shared_ptr< HashNode<K, T> >, 128 > m_table;
    F m_hash_function;
    int m_elem_count{ 0 };

    void Add(K p_key, T p_value);
};

template<typename K, typename T, typename F = HashKey<K>>
void HashMap<K, T, F>::Add(K p_key, T p_value)
{
    unsigned long key = m_hash_function(p_key);

    std::shared_ptr<HashNode<K, T>> new_node = std::make_shared<HashNode<K, T>>();
    new_node->m_key = p_key;
    new_node->m_value = p_value;


    if (m_table[key] == nullptr) {
        /* only item in the bucket */
        m_table[key] = std::move(new_node);
        m_elem_count++;
    }
    else {
        /* check if item exists so it is replaced */
        std::shared_ptr< HashNode<K, T> > current = m_table[key];
        std::shared_ptr< HashNode<K, T> > previous = m_table[key];
        while (current != nullptr && p_key != current->m_key ) {
            previous = current;
            current = current->next;
        }
        if (current == nullptr) {
            previous->next = new_node;
            //current = new_node;
            m_elem_count++;
        }
        else {
            current->m_value = p_value;
        }

    }
}

void TestHashMap() {

    HashMap<int, std::string> hash_map;

    hash_map.Add(1, "one");
    hash_map.Add(2, "does");
    hash_map.Add(3, "not");
    hash_map.Add(50, "simply");
    hash_map.Add(11, "waltz");
    hash_map.Add(11, "into");
    hash_map.Add(191, "mordor");

    std::cout << hash_map.m_elem_count << std::endl;
}
like image 291
dgouder Avatar asked Jan 07 '16 16:01

dgouder


People also ask

Why are smart pointers better than raw pointers?

Smart pointers are class objects that behave like raw pointers but manage objects that are new and when or whether to delete them— smart pointers automatically delete the managed object at the appropriate time.

When should smart pointers be used?

Use when you want to assign one raw pointer to multiple owners, for example, when you return a copy of a pointer from a container but want to keep the original. The raw pointer is not deleted until all shared_ptr owners have gone out of scope or have otherwise given up ownership.

How are smart pointers implemented?

In C++, a smart pointer is implemented as a template class that mimics, by means of operator overloading, the behaviors of a traditional (raw) pointer, (e.g. dereferencing, assignment) while providing additional memory management features.

Why pointers are used in data structure?

Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.


1 Answers

The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects.

If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an std::reference_wrapper is the appropriate choice.

If you need unique ownership (at most one owner of the heap-allocated object), then use std::unique_ptr. It has no additional time/memory overhead.

If you need shared ownership (any number of owners of the heap-allocated object), then use std::shared_ptr. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe.

There's no need to use std::unique_ptr (in place of a raw pointer) unless you actually need to own the object.

Assuming you need to own the object, there's no need to use std::shared_ptr (in place of an std::unique_ptr) unless you actually need shared ownership semantics.


In your case, it looks like you have a maximum of heap nodes in your HashMap. Therefore, I'm assuming that you want the HashMap instance to be the unique owner of the nodes.

What type should you use?

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array</* ? */, 128 > m_table;
    // ...
};

You have two options:

  1. If you want to store the objects with an heap indirection, use std::unique_ptr, as the unique owner of these heap-allocated object is and will always be the HashMap instance.

  2. If you want to store the objects directly into the HashMap, with no heap indirection, then do not use any pointer at all. This could lead to very big HashMap instances. Interface for accessing the next nodes becomes cumbersome.


Option 1 (store nodes in the heap):

This is the most common, and probably best option.

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<std::unique_ptr<HashNode<K, T>>, 128 > m_table;
    // ...
};

This will result in lighter (in terms of memory footprint) HashMap instances.

Note: using an std::vector in place of an std::array will reduce the size of HashMap significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the HashMap instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently.

There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by HashMap. A raw pointer will be sufficient.

template<typename K, typename T>
class HashNode {
public:
    // ...
    HashNode* next_ptr = nullptr;
    auto& next() 
    { 
        assert(next_ptr != nullptr);
        return *next_ptr;
    }
};

The code above will work fine, assuming that the HashMap is still alive when accessing next.


Option 2 (store nodes in the map instance):

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<HashNode<K, T>, 128 > m_table;
    // ...
};

HashMap instances may be enormous, depending on the size of HashNode<K, T>.

If you choose to store the nodes directly into the HashMap with no heap indirection, you will have to use an index to access the internal array, as moving/copying the HashMap around will change the memory address of the nodes.

template<typename K, typename T>
class HashNode {
public:
    // ...
    int next_index = -1;
    auto& next(HashMap& map) 
    { 
        assert(next_index != -1);
        return map.m_table[next_index]; 
    }
};
like image 190
Vittorio Romeo Avatar answered Oct 12 '22 08:10

Vittorio Romeo