I have seen many different implementations of BK Trees in many different languages, and literally none of them seem to include a way to remove nodes from the tree.
Even the original article where BK Trees were first introduced does not provide a meaningful insight about node deletion, as the authors merely suggest to mark the node to be deleted so that it is ignored:
The deletion of a key in Structures 1 [the BK Tree] and 2 follows a process similar to that above, with special consideration for the case in which the key to be deleted is the representative x° [root key]. In this case, the key cannot simply be deleted, as it is essential for the structure information. Instead an extra bit must be used for each key which denotes whether the key actually corresponds to a record or not. The search algorithm is modified correspondingly to ignore keys which do not correspond to records. This involves testing the extra bit in the Update procedure.
While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?
Algorithm: Starting at the root, find the deepest and rightmost node in the binary tree and the node which we want to delete. Replace the deepest rightmost node's data with the node to be deleted. Then delete the deepest rightmost node.
Deleting an element on a B-tree consists of three main events: searching the node where the key to be deleted exists, deleting the key and balancing the tree if required. While deleting a tree, a condition called underflow may occur.
While it may be theoretically possible to properly delete a node in a BK Tree, is it possible to do so in linear/sublinear time?
If you want to physically remove it from a BK-Tree, then I can't think of a way to do this in a linear time for all cases. Consider 2
scenarios, when a node is removed. Please note that I do not account for a time complexity related to calculating the Levenshtein distance because that operation doesn't depend on the number of words, although it requires some processing time too.
Here, even if step 1
can be done in O(1), steps 2
and 4
are way more expensive. Inserting a single node is O(h), where h is a height of tree. To make matters worse, this has to be done for each child node of the original node, and so it will be O(k*h), where k is a number of child nodes.
Rebuilding a tree will be at least O(n) in the best case and O(h*n) otherwise.
That's why it's better not to delete a node physically, but keep it in a tree and just mark it as deleted
. This way it will be used, as before, for inserting new nodes, but will be excluded from suggestion results for a misspelled word. This can be done in O(1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With