I'm trying to balance a set of (Million +) 3D points using a KD-tree and I have two ways of doing it.
Way 1:
Use an O(n) algorithm to find the arraysize/2-th largest element along a given axis and store it at the current node
Iterate over all the elements in the vector and for each, compare them to the element I just found and put those smaller in newArray1, and those larger in newArray2
Recurse
Way 2:
Use quicksort O(nlogn) to sort all the elements in the array along a given axis, take the element at position arraysize/2 and store it in the current node.
Then put all the elements from index 0 to arraysize/2-1 in newArray1, and those from arraysize/2 to arraysize-1 in newArray2
Recurse
Way 2 seems more "elegant" but way 1 seems faster since the median search and the iterating are both O(n) so I get O(2n) which just reduces to O(n). But then at the same time, even though way 2 is O(nlogn) time to sort, splitting up the array into 2 can be done in constant time, but does it make up for the O(nlogn) time for sorting?
What should I do? Or is there an even better way to do this that I'm not even seeing?
How about Way 3:
Use an O(n) algorithm such as QuickSelect to ensure that the element at position length/2 is the correct element, all elements before are less, and all afterwards are larger than it (without sorting them completely!) - this is probably the algorithm you used in your Way 1 step 1 anyway...
Recurse into each half (except middle element) and repeat with next axis.
Note that you actually do not need to make "node" objects. You can actually keep the tree in a large array. When searching, start at length/2 with the first axis.
I've seen this trick being used by ELKI. It uses very little memory and code, which makes the tree quite fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With