Have an idea for a data structure that would be useful for a research project and want to see if it already has a name or STL equivalent:
Create a sparse array of size n
as an array of sqrt(n)
pointers.
To insert an item at i
, go to the pointer at i/sqrt(n)
. If that pointer is null
, assign it to a new array of size sqrt(n)
. Insert your item in the new array at position i mod sqrt(n)
.
The advantage is simple: it allows you to create a large array that initially takes only O(sqrt(n))
space. You can access any element in constant time, and it allows you to fill portions of the array without needing to allocate space for all n
positions.
This may already be used for hash-table bucketing, and I have another application in mind (memoizing contigs in a very large DNA sequence). My question is: is there a name for this? Any common implementation I can use?
This data structure is closely related to the hashed array tree (HAT) data structure. A hashed array tree is structured like what you've described above - you have a top-level array of size √n, where each entry is a pointer to an array with √n elements. This makes insertions and lookups reasonably fast and has only O(√n) memory overhead compared with the O(n) memory overhead of a traditional dynamic array.
Your structure differs from the HAT in a few ways. For starters, your structure does not appear to have a way to grow the structure if you need more space, whereas the HAT is designed to be growable. Additionally, your structure allows for random insertions, whereas the HAT is designed for sequential insertions. That said, there's no fundamental reason that the HAT has to behave this way, so you can think of your data structure as a slight modification on the HAT. In fact, you might want to look at how the HAT grows in order to make your data structure support growth.
Hope this helps!
Well, I would call that a square matrix with row vectors. If not fully populated, it's a sparse square matrix with row vectors.1.
There are two optimizations at work here. The first is the sparse memory allocation, and the second is the strength reduction of the row subscript calculation. This optimization is probably not quite so important with superscaler CPU's executing instructions out-of-order, and in an age where compilers routinely do global flow optimizations.
But it does allow row indexing via a pointer dereference rather than a multiply by the row size.
1. However, in numerical analysis, a sparse matrix is one that's mostly zeroes, and so a sparse data structure formally has the same definition. In this case, it's more of a partial data structure, but there just aren't, to my knowledge, accepted terms for these things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With