Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Estimating the size of a tree

I'd like to estimate the number of leaves in a large tree structure for which I can't visit every node exhaustively. Is this algorithm appropriate? Does it have a name? Also, please pedant if I am using any terms improperly.

sum_trials = 0
num_trials = 0
WHILE time_is_not_up
    bits = 0
    ptr = tree.root
    WHILE count(ptr.children) > 0
         bits += log2(count(ptr.children))
         ptr = ptr.children[rand()%count(ptr.children)]
    sum_trials += bits
    num_trials++
estimated_tree_size = 2^(sum_trials/num_trials)
like image 451
William Entriken Avatar asked May 07 '10 14:05

William Entriken


2 Answers

This might be possible if you can make some safe assumptions about your tree (such as: is it balanced?) and its usage (is there a safe assumption about how many leaves will be children of the same node?). Better yet would be if you maintain a running tally (counter) everytime you add/remove a leaf node. Then you just access your counter variable in a single operation.

like image 181
FrustratedWithFormsDesigner Avatar answered Nov 11 '22 20:11

FrustratedWithFormsDesigner


A theory of estimating tree sizes, fundamental for the analysis of exponential-time algorithms and for certain areas of heuristics, is in

Handbook of Satisfiability, IOS Press 2009, ISBN 978-1-58603-929-5 (editors Armin Biere, Marijn J.H. Heule, Hans van Maaren and Toby Walsh) http://www.iospress.nl/book/handbook-of-satisfiability/

namely the chapter 7 "Fundaments of Branching Heuristics" (pages 205-244). The underlying technical report is http://www.swan.ac.uk/compsci/research/reports/2008/CSR7-2008.pdf The chapter is available at http://www.booksonline.iospress.nl/Content/View.aspx?piid=11712

This theory generalises the approach in the above mentioned paper by Donald Knuth (1975, Estimating the Efficiency of Backtrack Programs).

like image 20
Oliver Kullmann Avatar answered Nov 11 '22 20:11

Oliver Kullmann