B-Trees / B+Trees and duplicate keys

Tags:

I'm investigating the possibility of putting together a custom storage scheme for my application. It's worth the effort of potentially reinventing the wheel, I think, because both performance and storage efficiency are a main objective and the data and operations on it are far simpler than everything provided by an RDBMS (no updates, no deletes, predefined set of queries).

I'm using just a small handful of web resources I've found about B-Trees and B+-Trees - Wikipedia, http://www.bluerwhite.org/btree/, http://slady.net/java/bt/view.php, http://www.brpreiss.com/books/opus6/html/page342.html (the last one is the most valuable).

Duplicate keys

The first problem I'm trying to solve is how to deal with duplicate keys - this tree will be acting as a DB index and for example there won't just be one 'thing' with 'color=red', so looking up 'red' in this tree should yield many results.

There are two solutions I have come up with so far. The first is simply having multiple entries in the tree for each of these. But when there are 100,000 or 1,000,000 'red' things in the tree.. is that very efficient for a tree structure? The second was to have just one entry for each key, but the 'payload' associated with each key points to a different block of data, which is a linked list pointing to all instances of items that are 'red'.

Is there a common / better option?

B+Tree nodes changing types

I wanted to check an assumption I'm making. Say you have a B+-Tree, height 2 - the external (leaf) nodes at level 2 hold 'actual data'. Then an insertion necessitates a split of a leaf node - the leaf node no longer holds 'actual data'. Am I right in thinking that in implementation terms because the data might be of a substantial size that you would instead store a kind of 'pointer' as the 'actual data' - so if a leaf node becomes a branch node, that pointer (of the same size) is instead updated to point to the new subtree?

By that I mean, internal and external nodes, they should be the same size really since external nodes might become internal ones, and shuffling data around isn't a good idea?

(Added the C# tag since I'm implementing this from scratch in C#.)

894

asked Aug 03 '11 08:08

Kieren Johnstone

2 Answers

Kieren, I'm sure you figured out by now that B+ trees grow by splitting upwards, so that a leaf node is always a leaf node, and internal nodes are always internal nodes. Eventually, you must split the root node, which turns that into two internals, and you define a new root. So to answer the second part of your question, you don't change node types.

Regarding the first part of your question, when you delete a data record from the DB, you will need to find all the keys that point to that particular record, and remove them. If you have to look through long linear lists to do that, deleting will be slow. I am assuming you are using a binary search within a node in order to quickly find the correct node element (key + pointer), so if you make that "node searching" mechanism include the ability to ask for a particular key + pointer combination, you can quickly find the correct key element to remove. In other words, make the data record pointer part of the search (only when searching for a particular data record's key). This does mean that the duplicate keys will be stored in the nodes in "data pointer" order, so as long as ordering of the duplicate keys is not important, this mechanism will work.

answered Oct 10 '22 03:10

Ken Kopelson

Attempting to answer my own question.. I would welcome other answers too.

Duplicate Keys

The tree will store a reference to a list (memory) or linked-list (disk) of items with the given key, if duplicate entries for the same key is a possibility.

B+Tree nodes, changing types

In-memory, my nodes have an object reference, which can point to another node (in itself another valid B+Tree) in the case of an internal/branch node, or indeed data directly in the case of an external/leaf node. On disk, this would work in a very similar way: a 64-bit value for each 'link slot', as I have chosen to name them - either an offset in the file if pointing at a sub-node, or a block number if pointing to data directly (or the head of a linked-list in the case mentioned in the first part of the question).

answered Oct 10 '22 03:10

Kieren Johnstone

Related questions
                            
                                Can a Singleton Class inside a DLL be shared across processes?
                            
                                Cubic/Curve Smooth Interpolation in C# [closed]
                            
                                How do I determine if two similar band names represent the same band?
                            
                                OrderBy and Top in LINQ with good performance
                            
                                Why can't I use virtual/override on class variables as I can on methods?
                            
                                private classes inside namespaces [duplicate]
                            
                                C# generics when T could be an array
                            
                                SharpSvn: Getting repository structure and individual files
                            
                                Build failure in unit test project with accessors of a project containing covariant types
                            
                                How do I find out if a particular delegate has already been assigned to an event?
                            
                                Monotouch Global Exception handling
                            
                                Linq-to-SQL DataContext across multiple threads
                            
                                Is it wrong to compare a double to 0 like this: doubleVariable==0?
                            
                                Effect of LoaderOptimizationAttribute
                            
                                How to load an .exe as a .NET assembly?
                            
                                Exporting a native C function from a .net DLL? [duplicate]
                            
                                Datetime issues with Mongo and C#
                            
                                should the user's Account balance be stored in the database or calculated dynamically?
                            
                                Get all the defined mappings from an AutoMapper defined mapping
                            
                                WPF Databinding combobox to a list<string>

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

B-Trees / B+Trees and duplicate keys

Tags:

c#

data-structures

b-tree

Duplicate keys

B+Tree nodes changing types

Kieren Johnstone

People also ask

2 Answers

Ken Kopelson

Duplicate Keys

B+Tree nodes, changing types

Kieren Johnstone

Recent Activity

Donate For Us