Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure with fast contiguous ranges retrieval

Imagine data structure, that manipulates some contiguous container, and allows quick retrieval of contiguous ranges of indices, within this array, that contains data (and probably free ranges too). Let's call this ranges "blocks". Each block knows its head and tail index:

struct Block
{
    size_t begin;
    size_t end;
}

When we manipulating array, our data structure updates blocks:

    array view          blocks [begin, end]
--------------------------------------------------------------
0 1 2 3 4 5 6 7 8 9     [0, 9]

pop 2                   block 1 splitted

0 1 _ 3 4 5 6 7 8 9     [0, 1] [3, 9]

pop 7, 8                block 2 splitted

0 1 _ 3 4 5 6 _ _ 9     [0, 1] [3, 6] [9, 9]

push 7                  changed end of block 3

0 1 _ 3 4 5 6 7 _ 9     [0, 1] [3, 7] [9, 9]

push 5                  error: already in

0 1 _ 3 4 5 6 7 _ 9     [0, 1] [3, 7] [9, 9]

push 2                  blocks 1, 2 merged

0 1 2 3 4 5 6 7 _ 9     [0, 7] [9, 9]

Even before profiling, we know that blocks retrieval speed will be cornerstone of application performance. Basically usage is:

  • very often retrieval of contiguous blocks
  • quite rare insertions/deletions
  • most time we want number of blocks be minimal (prevent fragmentation)

What we have already tried:

  1. std::vector<bool> + std::list<Block*>. On every change: write true/false to vector, then traverse it in for loop and re-generate list. On every query of blocks return list. Slower than we wanted.

  2. std::list<Block*> update list directly, so no traversing. Return list. Much code to debug/test.

Questions:

  1. Is that data structure has some generic name?
  2. Is there already such data structures implemented (debugged and tested)?
  3. If no, what can you advice on fast and robust implementation of such data structure?

Sorry if my explanation is not quite clear.

Edit

Typical application for this container is managing buffers: either system or GPU memory. In case of GPU we can store huge amounts of data in single vertex buffer, and then update/invalidate some regions. On each draw call we must know first and last index of each valid block in buffer to draw (very often, tenth to hundreds times per second) and sometimes (once a second) we must insert/remove blocks of data.

Another application is a custom "block memory allocator". For that purpose, similar data structure implemented in "Alexandrescu A. - Modern C++ Design" book via intrusive linked list. I'm looking for better options.

like image 809
Ivan Aksamentov - Drop Avatar asked Aug 20 '13 22:08

Ivan Aksamentov - Drop


2 Answers

What I see here is a simple binary tree.
You have pairs (blocks) with a begin and an end indices, that is, pairs (a,b) where a <= b. So the set of blocks can be easily ordered and stored in a search-binary-tree.
Searching the block wich corresponds to a given number is easy (Just the tipical bynary-tree-search). So when you delete a number from the array, you need to search the block that corresponds to the number and split it in two new blocks. Note that all blocks are leaves, the internal nodes are the intervals wich the two child nodes forms.
Insertion on the other hand means searching the block, and test its brothers to know if the brothers have to be collapsed. This should be done recursively up through the tree.

like image 58
Manu343726 Avatar answered Nov 14 '22 18:11

Manu343726


You may want to try a tree like structure, either a simple red-black tree or a B+ tree.

like image 23
Jesus Ramos Avatar answered Nov 14 '22 18:11

Jesus Ramos