Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equality of two binary search trees constructed from unordered arrays

Given two unsorted arrays of size N each, we are to determine if the Binary Search Tree constructed from them will be equal or not.

So, the elements of the array are picked and inserted into a basic (no balancing, no red-black, nothing) binary search tree. Given directly two arrays, can we determine whether they give rise to the same Binary Search Tree.

There is an obvious O(N2) worst case time complexity solution: Construct two trees, and compare their equality.

Is there a O(N) or an O(N log N) solution to this?

The idea of the problem that I am trying to extract is: The construction of the BST depends on the relative positions of the elements. For example, if there is an element with value 51 immediately next to 20 in one array, there must be no element between 20 and 51 in the other array for the trees to be equal (otherwise 20's right child would be that number, not 51).

I did try a few approaches:

  1. The naive approach: construct 2 trees and compare.
  2. I used an interesting variant where I'd partition the array into 2 (one smaller sub-array and one sub-array bigger than the pivot), and recursively pass the left array to the left child, and the other to the right. In-place and cheeky, but still O(N2).
  3. A friend tried applying the longest sub-sequence to it and finding it and then comparing, but that is incorrect.
  4. I was making inroads into maybe solving it with a LinkedHashMap, but I am having a tough time proving its correctness.

Help, and any hints towards solving this problem would be much appreciated.

like image 885
Kanishk Avatar asked Dec 30 '12 17:12

Kanishk


People also ask

Can binary search tree have equal values?

In a Binary Search Tree (BST), all keys in left subtree of a key must be smaller and all keys in right subtree must be greater. So a Binary Search Tree by definition has distinct keys.

Can a binary tree be unordered?

A binary unordered tree is a tree where each internal node has two children and the relative order of the subtrees of a node is not important (i.e. two trees are different if they differ only in the respective ordering of subtrees of nodes).

Which of the following properties is correct with respect to binary search tree?

Binary Search Tree is a node-based binary tree data structure which has the following properties: The left subtree of a node contains only nodes with keys lesser than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key.


2 Answers

Summary

I think you can improve the naive approach from O(N^2) to O(NlogN) by using range minimum query to construct the binary tree.

Fast binary tree construction

Suppose we want to construct the binary tree for an array A.

The idea is to first construct an array B where B[i] is the position of the ith largest element in A. This can be done by sorting in O(NlogN).

We can then use range minimum query on array B to allow us to find the minimum value of B[i] for a given range a<=i<=b. In other words, this lets us find the first position in A where we have a value in the range between the ath and bth largest elements.

RMQ takes time O(N) to preprocess, and then queries can be answered in time O(1).

We can then recursively find for each element its left and right children (if any) and check that they match.

Pseudocode

Suppose the two arrays are A and A2, and we assume for simplicity that A,A2 have been preprocessed such that the ith largest element is equal to i.

The trees are identical if find_children(1,N) is True:

find_children(low,high)
   if high==low
      return True
   node = A[RMQ(low,high)]
   return node == A2[RMQ2(low,high)]
          and find_children(low,node-1)
          and find_children(node+1,high)

This function is called once for each node (and empty child pointer) in the tree so takes time O(N).

Overall, this is O(NlogN) as the preprocess sorting takes O(NlogN).

Explanation

Suppose we have entered elements 20 and 51 into a binary tree. We will then have 20 being the root, and 51 being the right child. To find the left child of 51 we need to find the first element in the array which has a value greater than 20, and less than 51. This value is given by our range minimum query applied to the range 20+1->51-1.

We can therefore find the left and right children of all nodes faster than by inserting them into the binary tree in the natural way (only faster in a theoretical worst case - the other methods may well be faster for typical examples).

like image 53
Peter de Rivaz Avatar answered Oct 08 '22 20:10

Peter de Rivaz


"Construct two trees and compare" does not have to be O(N^2). You can use an auxilliary data structure that lets you find the position of the new node in O(log N) instead of O(N), so that the construction of the BST is O(N log N) even if the BST being constructed is not balanced.

With each empty position (i.e. a free child slot in a node) pos in a BST, there is an associated interval (a_pos,b_pos) (one of the values might be +/- infinity), such that new node for value v will be created at pos if and only if the value is in the interval.

You can store the intervals in a balanced interval tree, so that the position for each new arriving value can be found in O(log N). The update of the interval tree is also O(log N), as you only replace one interval with two.

(Actually, the intervals never overlap, so the auxilliary structure can be a plain old (balanced) BST instead of an interval tree.)

Example:

Take the following non-balanced BST, constructed for an array prefix [1,10,2,9,3, ...]

  1
 /  \
a  10
   / \
  2   f
 / \
b   9
   / \
  3   e
 / \
c   d

The letters a-f denote the possible places where a new node can be placed (nil leaves). With each of the letter, there's an associated interval, as follows:

[-inf,1] -> a
[1,2] -> b
[2,3] -> c
[3,9] -> d
[9,10] -> e
[10, +inf] -> f

A new node for a value v will be added into the BST at the place determined by the interval that v belongs to. Zero will end up at a, 5 at d and so on. The key idea is to store this information outside of the tree.

If you can efficiently represent the above table (with links to the actual tree nodes), adding new node to the tree will take O(access to table) + O(1). The O(1) represents adding the node into the non-balanced BST, given that you already know where you place it. Adding 5 will not require comparing with 1,10,2,9 and 3 but instead will be looked up in the table and and placed directly at d.

Once you place the new node, you obviously also have to update the table. The data structure to represent the table could be an interval tree ( http://en.wikipedia.org/wiki/Interval_tree ).

like image 35
Rafał Dowgird Avatar answered Oct 08 '22 21:10

Rafał Dowgird