Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently querying a B+ Tree holding multidimensional data

I have a collection of tuples (x,y) of 64-bit integers that make up my dataset. I have, say, trillions of these tuples; it is not feasible to keep the dataset in memory on any machine on earth. However, it is quite reasonable to store them on disk.

I have an on-disk store (a B+-tree) that allow for the quick, and concurrent, querying of data in a single dimension. However, some of my queries rely on both dimensions.

Query examples:

  • Find the tuple whose x is greater than or equal than some given value
  • Find the tuple whose x is as small as possible s.t. it's y is greater than or equal to some given value
  • Find the tuple whose x is as small as possible s.t. it's y is less than or equal to some given value
  • Perform maintenance operations (insert some tuple, remove some tuple)

The best bet I have found are Z-order curves but I cannot seem to figure out how to conduct the queries given my two dimensional data-set.

Solutions that are not acceptable include a sequential scan of the data, this could be far too slow.

like image 990
user1670103 Avatar asked Nov 03 '22 15:11

user1670103


1 Answers

I think, the most appropriate data structures for your requirements are R-tree and its variants (R*-tree, R+-tree, Hilbert R-tree). R-tree is similar to B+-tree, but also allows multidimensional queries.

Other relevant data structure is Priority Search Tree. It is good for queries like your examples 1 .. 3, but not very efficient if you need frequent updates or on-disk store. For details see this paper or this book: "Handbook of Data Structures and Applications" (Chapter 18.5).

like image 194
Evgeny Kluev Avatar answered Nov 09 '22 07:11

Evgeny Kluev