Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting a node in a BK Tree

I have seen many different implementations of BK Trees in many different languages, and literally none of them seem to include a way to remove nodes from the tree.

Even the original article where BK Trees were first introduced does not provide a meaningful insight about node deletion, as the authors merely suggest to mark the node to be deleted so that it is ignored:

The deletion of a key in Structures 1 [the BK Tree] and 2 follows a process similar to that above, with special consideration for the case in which the key to be deleted is the representative x° [root key]. In this case, the key cannot simply be deleted, as it is essential for the structure information. Instead an extra bit must be used for each key which denotes whether the key actually corresponds to a record or not. The search algorithm is modified correspondingly to ignore keys which do not correspond to records. This involves testing the extra bit in the Update procedure.

While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?

like image 427
farsil Avatar asked Feb 14 '17 15:02

farsil


People also ask

How do you remove a node from a tree?

Algorithm: Starting at the root, find the deepest and rightmost node in the binary tree and the node which we want to delete. Replace the deepest rightmost node's data with the node to be deleted. Then delete the deepest rightmost node.

How do I delete an element from my ab tree?

Deleting an element on a B-tree consists of three main events: searching the node where the key to be deleted exists, deleting the key and balancing the tree if required. While deleting a tree, a condition called underflow may occur.


1 Answers

While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?

If you want to physically remove it from a BK-Tree, then I can't think of a way to do this in a linear time for all cases. Consider 2 scenarios, when a node is removed. Please note that I do not account for a time complexity related to calculating the Levenshtein distance because that operation doesn't depend on the number of words, although it requires some processing time too.

Remove non-root node

  1. Find a parent of the node in the tree.
  2. Save node's child nodes.
  3. Nullify parent's reference to the node.
  4. Re-add each child node as if it were a new node.

Here, even if step 1 can be done in O(1), steps 2 and 4 are way more expensive. Inserting a single node is O(h), where h is a height of tree. To make matters worse, this has to be done for each child node of the original node, and so it will be O(k*h), where k is a number of child nodes.

Remove root node

  1. Rebuild the tree from scratch without using the previous root node.

Rebuilding a tree will be at least O(n) in the best case and O(h*n) otherwise.

Alternative solution

That's why it's better not to delete a node physically, but keep it in a tree and just mark it as deleted. This way it will be used, as before, for inserting new nodes, but will be excluded from suggestion results for a misspelled word. This can be done in O(1).

like image 72
Anatolii Avatar answered Sep 28 '22 02:09

Anatolii