Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to unify contiguous chunks in a collection of chunks

I'm creating a pre-allocator with dynamic memory chunk size, and I need to unify contiguous memory chunks.

struct Chunk // Chunk of memory
{
    Ptr begin, end; // [begin, end) range
}

struct PreAlloc
{
    std::vector<Chunk> chunks; // I need to unify contiguous chunks here
    ...
}

I tried a naive solution, that, after sorting the chunks based on their begin, basically did a pass through the vector checking if the next chunk's begin was equal to the current chunk's end. I'm sure it could be improved.

Is there a good algorithm to unify contiguous ranges?

Information:

  • Chunks can never "overlap".
  • Chunks can have any size greater than 0.
  • Performance is the most important factor.
like image 607
Vittorio Romeo Avatar asked Aug 22 '13 17:08

Vittorio Romeo


1 Answers

NOTE: there was an error in my original algorithm, where I only considered blocks to the left of the current block.

Use two associative tables (e.g. unordered_map), one mapping the begin address to the Chunk, another mapping the end to the Chunk. This lets you find the neighbouring blocks quickly. Alternatively, you can change the Chunk struct to store a pointer/id/whatever to the neighbouring Chunk, plus a flag to mark to tell if it's free.

The algorithm consists of scanning the vector of chunks once, while maintaining the invariant: if there is a neighbour to the left, you merge them; if there is a neighbour to the right, you merge them. At the end, just collect the remaining chunks.

Here's the code:

void unify(vector<Chunk>& chunks)
{
    unordered_map<Ptr, Chunk> begins(chunks.size() * 1.25); // tweak this
    unordered_map<Ptr, Chunk> ends(chunks.size() * 1.25); // tweak this

    for (Chunk c : chunks) {        
        // check the left
        auto left = ends.find(c.begin);
        if (left != ends.end()) { // found something to the left
            Chunk neighbour = left->second;
            c.begin = neighbour.begin;
            begins.erase(neighbour.begin);
            ends.erase(left);
        }
        // check the right
        auto right = begins.find(c.end);
        if (right != begins.end()) { // found something to the right
            Chunk neighbour = right->second;
            c.end = neighbour.end;
            begins.erase(right);
            ends.erase(neighbour.end);
        }

        begins[c.begin] = c;
        ends[c.end] = c;
    }
    chunks.clear();
    for (auto x : begins)
        chunks.push_back(x.second);
}

The algorithm has O(n) complexity assuming constant time access to the begins and ends tables (which is nearly what you get if you don't trigger rehashing, hence the "tweak this" comments). There are quite a few options to implement associative tables, make sure to try a few different alternatives; as pointed out in the comment by Ben Jackson, a hash table doesn't always make good use of cache, so even a sorted vector with binary searches might be faster.

If you can change the Chunk structure to store left/right pointers, you get a guaranteed O(1) lookup/insert/remove. Assuming you are doing this to consolidate free chunks of memory, the left/right checking can be done in O(1) during the free() call, so there is no need to consolidate it afterwards.

like image 153
DanielKO Avatar answered Jan 25 '23 23:01

DanielKO