Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange results when plotting (Cormen) Red-black tree insert

I implemented in Red-Black trees in Python according to the pseudo code in Cormen's Introduction to Algorithms.

I wanted to see in my own eyes that my insert is really O(logn) so I plotted the time it takes to insert n=1, 10, 20, ..., 5000 nodes into the tree.

This is the result:

enter image description here

the x-axis is n and the y-axis is the time it took in milliseconds.

To me the graph looks more linear than logarithmic. What can explain that?

like image 434
Zack Avatar asked Mar 01 '12 23:03

Zack


2 Answers

Ok, so the graph displays a measurement of the cost of inserting n elements into your tree, where the x axis is how many elements we've inserted, and the y axis is the total time.

Let's call the function that totals the time it takes to insert n elements into the tree f(n).

Then we can get a rough idea of what f might look like:

f(1) < k*log(1)                 for some constant k.

f(2) < k*log(1) + k*log(2)      for some constant k

...

f(n) < k * [log(1) + log(2) + ... + log(n)]   for some constant k.

Due to how logs work, we can collapse log(1) + ... + log(n):

f(n) < k * [log(1*2*3*...*n)]     for some constant k

f(n) < k * log(n!)                for some constant k

We can take a look at Wikipedia to see a graph of what log(n!) looks like. Take a look at the graph in the article. Should look pretty familiar to you. :)

That is, I think you've done this by accident:

for n in (5000, 50000, 500000):
    startTime = ...
    ## .. make a fresh tree
    ## insert n elements into the tree
    stopTime = ...
    ## record the tuple (n, stopTime - startTime) for plotting

and plotted total time to construct the tree of size n, rather than the individual cost of inserting one element into a tree of size n:

for n in range(50000):
    startTime = ...
    ## insert an element into the tree
    stopTime = ...
    ## record the tuple (n, stopTime - startTime) for plotting

Chris Taylor notes in the comments that if you plot f(n)/n, you'll see a log graph. That's because a fairly tight approximation to log(n!) is n*log(n) (see the Wikipedia page). So we can go back to our bound:

f(n) < k * log(n!)                for some constant k

and get:

f(n) < k * n * log(n)             for some constant k

And now it's should be easier to see that if you divide f(n) by n, your graph will be bounded above by the shape of a logarithm.

like image 166
dyoo Avatar answered Sep 27 '22 22:09

dyoo


5000 might not be large enough to really "see" the logarithm -- try runs at 50000 and 500000. If it takes two seconds and twenty seconds, then linear growth makes sense. If it takes less, then logarithmic makes sense. If you zoom in closely enough on most "simple" functions, the results look pretty linear.

like image 26
sarnold Avatar answered Sep 27 '22 22:09

sarnold