I would like to ask if someone knows a performant way to store the path from the root node to a new node of a multiway tree during the insertion of the new node. E.g., if I have the following tree:
For each node, I currently store an array of the path to the node from the root node during insertion in the following way by assigning a unique int
ID to each children on the same depth:
Root node -> [1]
Depth 1, child 1 of root -> [1, 1]
Depth 1, child 2 of root -> [1, 2]
Depth 2, child 1 of parent 1 -> [1, 1, 1]
Depth 2, child 2 of parent 1 -> [1, 1, 2]
Depth 2, child 3 of parent 1 -> [1, 1, 3]
Depth 2, child 1 of parent 2 -> [1, 2, 4]
Depth 2, child 2 of parent 2 -> [1, 2, 5]
Depth 3, child 1 of parent 3 -> [1, 1, 3, 1]
...
If I now insert a new node from the leaf node 1
on depth 3, I would have to create a new path array for it storing all the nodes of the parent 1
(i.e. [1, 1, 3, 1]
) plus the new child ID, which is 1
for the first child:
Depth 4, child 1 of parent 1 -> [1, 1, 3, 1, 1]
As my tree grows a lot in height (the number of children per depth is relatively low, but the depth can be high), the slow part of this algorithm would be this array recreation process. Just imagine a tree of depth 1.000.000
, if I insert a new node from a node of depth 1.000.000
, I would have to create a new array for this new node storing all the 1.000.001
IDs of the parent plus appending the new node's ID:
Depth 1.000.001, child 1 of parent x -> [...1 million and one IDs... , 1]
Is there a more efficient way to store the path on each node during node's insertion?
I basically need this to determine if any given node is a child of a possible parent node in the tree and as I have the path stored in each node, I can easily do that by checking the path array of the child, like this:
// Ex. 1
Is node 4 of depth 2 the parent/ancestor of node 1 of depth 3?
node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 2 is: pathArray[2] -> 3
3 != 4 and therefore I know that node 4 of depth 2
is not a parent of node 1 of depth 3.
// Ex. 2
Is node 1 of depth 1 the parent/ancestor of node 1 of depth 3?
node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 1 is: pathArray[1] -> 1
1 == 1 and therefore I know that node 1 of depth 1
is a parent of node 1 of depth 3.
This lookup operation would be fast, the problem is the creation of the path array as the tree goes deeper.
Any suggestions would be appreciated.
Thank you for the attention.
Every node id appears in children sum except root. So if we do the sum of all ids and subtract it from the sum of all children's sums, we get the root.
A path in a binary tree is a sequence of nodes where each pair of adjacent nodes in the sequence has an edge connecting them. A node can only appear in the sequence at most once. Note that the path does not need to pass through the root. The path sum of a path is the sum of the node's values in the path.
Right now, your solution has O(1)
lookup time, O(h)
insert time and O(n^2)
space conpelxity, where n
is the number of nodes and' h
is the height, which is at most n
.
You can achieve a tradeoff with O(log n)
lookup, O((log n)^2)
insert and O(n log n)
space in the following way:
Let every node store one jump pointer to each of its ancestors with distance 1 (its parent), 2 (grandparent), 4 (grandparent's grandparent), 8, 16 and so on, until the root is reached or passed. The maximum distance from any node to the root is n
, so for every node you store O(log n)
jump-pointers. Since you do this for every node, the total space complexity is O(n log n)
.
Answering the query of whether y
is an ancestor of x
is trivial if y
doesn't have a lower depth than x
. Name the depths of the nodes dy
and dx
. You know that if y
is an ancestor of x
, then it is the dx-dy
'th ancestor of x
. That is, if dy = 5
and dx = 17
, you know that if y
is x
's ancestor, then it is 17 - 5
levels above x
.
Therefore, you can perform lookups by recursively jumping the largest possible distance upwards in the tree from x
without overshooting the target ancestor. For instance, if you're starting at depth 16 and want to find the ancestor at depth 6, you're interested in the ancestor 10 levels above. You cannot jump 16 levels up, as this would overshoot the target ancestor, so you jump 8 instead. Now you're at depth 16-8=8, and the remaining distance to the target ancestor, which is 6, is 2. Since there is a pointer which goes exactly two steps up, you follow that and you've arrived at the target ancestor.
Every time you follow a pointer upwards in the tree, you're getting at least half way to your target, so the maximum number of pointers you can follow is O(log n)
.
When inserting a node e
as a child of another node x
you can construct e
's jump-pointers by finding x
's ancestors with distance 1, 3, 7, 15, etc. (since e
is one level further away from all of these than x
is). There are O(log n)
such searches. As we argued above, each of the lookups take O(log n)
time. Thus the total is O((log n)^2)
.
This operation might even be made even faster by storing some additional information, but I can't see exactly how just now.
NOTE
This idea is actually a part of the classical solution for the Level Ancestor Problem. The classical solution allows for lookups as you have described them in O(1)
time, while keeping the space of the entire data structure to O(n)
. However, the data structure is static, so the solution does not specify how to do insertions. There might be a way to adapt the level ancestor to a dynamic scenario and get even better running times than I've described here, but I'm not sure how.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With