Efficiently querying a B+ Tree holding multidimensional data

Question

I have a collection of tuples (x,y) of 64-bit integers that make up my dataset. I have, say, trillions of these tuples; it is not feasible to keep the dataset in memory on any machine on earth. However, it is quite reasonable to store them on disk.

I have an on-disk store (a B+-tree) that allow for the quick, and concurrent, querying of data in a single dimension. However, some of my queries rely on both dimensions.

Query examples:

Find the tuple whose x is greater than or equal than some given value
Find the tuple whose x is as small as possible s.t. it's y is greater than or equal to some given value
Find the tuple whose x is as small as possible s.t. it's y is less than or equal to some given value
Perform maintenance operations (insert some tuple, remove some tuple)

The best bet I have found are Z-order curves but I cannot seem to figure out how to conduct the queries given my two dimensional data-set.

Solutions that are not acceptable include a sequential scan of the data, this could be far too slow.

Evgeny Kluev · Accepted Answer

I think, the most appropriate data structures for your requirements are R-tree and its variants (R*-tree, R+-tree, Hilbert R-tree). R-tree is similar to B+-tree, but also allows multidimensional queries.

Other relevant data structure is Priority Search Tree. It is good for queries like your examples 1 .. 3, but not very efficient if you need frequent updates or on-disk store. For details see this paper or this book: "Handbook of Data Structures and Applications" (Chapter 18.5).

Efficiently querying a B+ Tree holding multidimensional data

Tags:

algorithm

multidimensional-array

data-structures

b-tree

user1670103

1 Answers

Evgeny Kluev

Recent Activity

Donate For Us

Efficiently querying a B+ Tree holding multidimensional data

Tags:

algorithm

multidimensional-array

data-structures

b-tree

user1670103

1 Answers

Evgeny Kluev

Related questions

Recent Activity

Donate For Us