I'd like to estimate the number of leaves in a large tree structure for which I can't visit every node exhaustively. Is this algorithm appropriate? Does it have a name? Also, please pedant if I am using any terms improperly.
sum_trials = 0
num_trials = 0
WHILE time_is_not_up
bits = 0
ptr = tree.root
WHILE count(ptr.children) > 0
bits += log2(count(ptr.children))
ptr = ptr.children[rand()%count(ptr.children)]
sum_trials += bits
num_trials++
estimated_tree_size = 2^(sum_trials/num_trials)
This might be possible if you can make some safe assumptions about your tree (such as: is it balanced?) and its usage (is there a safe assumption about how many leaves will be children of the same node?). Better yet would be if you maintain a running tally (counter) everytime you add/remove a leaf node. Then you just access your counter variable in a single operation.
A theory of estimating tree sizes, fundamental for the analysis of exponential-time algorithms and for certain areas of heuristics, is in
Handbook of Satisfiability, IOS Press 2009, ISBN 978-1-58603-929-5 (editors Armin Biere, Marijn J.H. Heule, Hans van Maaren and Toby Walsh) http://www.iospress.nl/book/handbook-of-satisfiability/
namely the chapter 7 "Fundaments of Branching Heuristics" (pages 205-244). The underlying technical report is http://www.swan.ac.uk/compsci/research/reports/2008/CSR7-2008.pdf The chapter is available at http://www.booksonline.iospress.nl/Content/View.aspx?piid=11712
This theory generalises the approach in the above mentioned paper by Donald Knuth (1975, Estimating the Efficiency of Backtrack Programs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With