Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trie vs B+ tree

Tags:

algorithm

How does Trie and B+ tree compare for indexing lexicographically sorted strings [on the order some billions]? It should support range queries as well.

From perf. as well as implementation complexity point of view.

like image 300
Fakrudeen Avatar asked Apr 22 '10 06:04

Fakrudeen


People also ask

What is the difference between trie and tree?

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

Which is better B-tree or B+ tree?

Unlike this, the B+ tree can store more keys than the B-tree of the same level because of their feature of storing pointers only at the leaf nodes. This contributes to storing more entries at fewer levels in the B+ tree and lessens the searching time for a key.

What is the difference between B-tree and AVL tree?

AVL tree is a binary tree while B-tree is a multi-way tree (N-ary tree) i.e. Any node in AVL tree can have at max two child nodes and one piece of information/data while any node in a B-tree can have n nodes and n-1 piece of information/data. For B-tree, n is also known as its order.

What is difference between B-tree and binary tree?

B-Tree : B-Tree is known as a self-balancing tree as its nodes are sorted in the inorder traversal. Unlike the binary trees, in B-tree, a node can have more than two children. B-tree has a height of logM N (Where 'M' is the order of tree and N is the number of nodes).


2 Answers

I would say it depends on what you mean by Range.

If your range is expressed as All words beginning by, then a Trie is the right choice I'd say. On the other hand, Trie are not meant for requests like All words between XX and ZZ.

Note that the branching factor of the B+ Tree affects its performance (the number of intermediary nodes). If h is the height of the tree, then nmax ~~ bh. Therefore h ~~ log(nmax) / log(b).

With n = 1 000 000 000 and b = 100, we have h ~~ 5. Therefore it means only 5 pointer dereferencing for going from the root to the leaf. It's more cache-friendly than a Trie.

Finally, B+ Tree is admittedly more difficult to implement than a Trie: it's more on a Red-Black Tree level of complexity.

like image 91
Matthieu M. Avatar answered Sep 19 '22 03:09

Matthieu M.


Depends on your actual task:

  • If you want to get the whole subtree, a B+Tree is your best choice because it is space efficient.
  • But if you want to get the first N children from a substree, then a Trie is the best choice because you simply visit less nodes than in a B+ Tree scenario.
  • The most popular task which is well-handled by a Trie is a word prefix completion.
like image 29
Denis Bazhenov Avatar answered Sep 20 '22 03:09

Denis Bazhenov