Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the name of this array data structure?

Have an idea for a data structure that would be useful for a research project and want to see if it already has a name or STL equivalent:

Create a sparse array of size n as an array of sqrt(n) pointers.

To insert an item at i, go to the pointer at i/sqrt(n). If that pointer is null, assign it to a new array of size sqrt(n). Insert your item in the new array at position i mod sqrt(n).

The advantage is simple: it allows you to create a large array that initially takes only O(sqrt(n)) space. You can access any element in constant time, and it allows you to fill portions of the array without needing to allocate space for all n positions.

This may already be used for hash-table bucketing, and I have another application in mind (memoizing contigs in a very large DNA sequence). My question is: is there a name for this? Any common implementation I can use?

like image 308
Nathan Fig Avatar asked Apr 18 '13 16:04

Nathan Fig


2 Answers

This data structure is closely related to the hashed array tree (HAT) data structure. A hashed array tree is structured like what you've described above - you have a top-level array of size √n, where each entry is a pointer to an array with √n elements. This makes insertions and lookups reasonably fast and has only O(√n) memory overhead compared with the O(n) memory overhead of a traditional dynamic array.

Your structure differs from the HAT in a few ways. For starters, your structure does not appear to have a way to grow the structure if you need more space, whereas the HAT is designed to be growable. Additionally, your structure allows for random insertions, whereas the HAT is designed for sequential insertions. That said, there's no fundamental reason that the HAT has to behave this way, so you can think of your data structure as a slight modification on the HAT. In fact, you might want to look at how the HAT grows in order to make your data structure support growth.

Hope this helps!

like image 157
templatetypedef Avatar answered Oct 13 '22 08:10

templatetypedef


Well, I would call that a square matrix with row vectors. If not fully populated, it's a sparse square matrix with row vectors.1.

There are two optimizations at work here. The first is the sparse memory allocation, and the second is the strength reduction of the row subscript calculation. This optimization is probably not quite so important with superscaler CPU's executing instructions out-of-order, and in an age where compilers routinely do global flow optimizations.

But it does allow row indexing via a pointer dereference rather than a multiply by the row size.


1. However, in numerical analysis, a sparse matrix is one that's mostly zeroes, and so a sparse data structure formally has the same definition. In this case, it's more of a partial data structure, but there just aren't, to my knowledge, accepted terms for these things.

like image 36
DigitalRoss Avatar answered Oct 13 '22 08:10

DigitalRoss