Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is insertion into my tree faster on sorted input than random input?

Tags:

Now I've always heard binary search trees are faster to build from randomly selected data than ordered data, simply because ordered data requires explicit rebalancing to keep the tree height at a minimum.

Recently I implemented an immutable treap, a special kind of binary search tree which uses randomization to keep itself relatively balanced. In contrast to what I expected, I found I can consistently build a treap about 2x faster and generally better balanced from ordered data than unordered data -- and I have no idea why.

Here's my treap implementation:

  • http://pastebin.com/VAfSJRwZ

And here's a test program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;

namespace ConsoleApplication1
{

    class Program
    {
        static Random rnd = new Random();
        const int ITERATION_COUNT = 20;

        static void Main(string[] args)
        {
            List<double> rndTimes = new List<double>();
            List<double> orderedTimes = new List<double>();

            rndTimes.Add(TimeIt(50, RandomInsert));
            rndTimes.Add(TimeIt(100, RandomInsert));
            rndTimes.Add(TimeIt(200, RandomInsert));
            rndTimes.Add(TimeIt(400, RandomInsert));
            rndTimes.Add(TimeIt(800, RandomInsert));
            rndTimes.Add(TimeIt(1000, RandomInsert));
            rndTimes.Add(TimeIt(2000, RandomInsert));
            rndTimes.Add(TimeIt(4000, RandomInsert));
            rndTimes.Add(TimeIt(8000, RandomInsert));
            rndTimes.Add(TimeIt(16000, RandomInsert));
            rndTimes.Add(TimeIt(32000, RandomInsert));
            rndTimes.Add(TimeIt(64000, RandomInsert));
            rndTimes.Add(TimeIt(128000, RandomInsert));
            string rndTimesAsString = string.Join("\n", rndTimes.Select(x => x.ToString()).ToArray());

            orderedTimes.Add(TimeIt(50, OrderedInsert));
            orderedTimes.Add(TimeIt(100, OrderedInsert));
            orderedTimes.Add(TimeIt(200, OrderedInsert));
            orderedTimes.Add(TimeIt(400, OrderedInsert));
            orderedTimes.Add(TimeIt(800, OrderedInsert));
            orderedTimes.Add(TimeIt(1000, OrderedInsert));
            orderedTimes.Add(TimeIt(2000, OrderedInsert));
            orderedTimes.Add(TimeIt(4000, OrderedInsert));
            orderedTimes.Add(TimeIt(8000, OrderedInsert));
            orderedTimes.Add(TimeIt(16000, OrderedInsert));
            orderedTimes.Add(TimeIt(32000, OrderedInsert));
            orderedTimes.Add(TimeIt(64000, OrderedInsert));
            orderedTimes.Add(TimeIt(128000, OrderedInsert));
            string orderedTimesAsString = string.Join("\n", orderedTimes.Select(x => x.ToString()).ToArray());

            Console.WriteLine("Done");
        }

        static double TimeIt(int insertCount, Action<int> f)
        {
            Console.WriteLine("TimeIt({0}, {1})", insertCount, f.Method.Name);

            List<double> times = new List<double>();
            for (int i = 0; i < ITERATION_COUNT; i++)
            {
                Stopwatch sw = Stopwatch.StartNew();
                f(insertCount);
                sw.Stop();
                times.Add(sw.Elapsed.TotalMilliseconds);
            }

            return times.Average();
        }

        static void RandomInsert(int insertCount)
        {
            Treap<double> tree = new Treap<double>((x, y) => x.CompareTo(y));
            for (int i = 0; i < insertCount; i++)
            {
                tree = tree.Insert(rnd.NextDouble());
            }
        }

        static void OrderedInsert(int insertCount)
        {
            Treap<double> tree = new Treap<double>((x, y) => x.CompareTo(y));
            for(int i = 0; i < insertCount; i++)
            {
                tree = tree.Insert(i + rnd.NextDouble());
            }
        }
    }
}

And here's a chart comparing random and ordered insertion times in milliseconds:

Insertions         Random          Ordered         RandomTime / OrderedTime
50                 1.031665        0.261585        3.94
100                0.544345        1.377155        0.4
200                1.268320        0.734570        1.73
400                2.765555        1.639150        1.69
800                6.089700        3.558350        1.71
1000               7.855150        4.704190        1.67
2000               17.852000       12.554065       1.42
4000               40.157340       22.474445       1.79
8000               88.375430       48.364265       1.83
16000              197.524000      109.082200      1.81
32000              459.277050      238.154405      1.93
64000              1055.508875     512.020310      2.06
128000             2481.694230     1107.980425     2.24

I don't see anything in the code which makes ordered input asymptotically faster than unordered input, so I'm at a loss to explain the difference.

Why is it so much faster to build a treap from ordered input than random input?

like image 800
Juliet Avatar asked Mar 13 '10 08:03

Juliet


2 Answers

Self-balancing trees exist to fix the problems associated non-randomly-distributed data. By definition, they trade away a bit of the best-case performance to vastly improve the worst-case performance associated with non-balanced BSTs, specifically that of sorted input.

You're actually overthinking this problem, because slower insertion of random data vs. ordered data is a characteristic of any balanced tree. Try it on an AVL and you'll see the same results.

Cameron had the right idea, removing the priority check to force the worst case. If you do that and instrument your tree so you can see how many rebalances are happening for each insert, it actually becomes very obvious what's going on. When inserting sorted data, the tree always rotates left and the root's right child is always empty. Insertion always results in exactly one rebalance because the insertion node has no children and no recursion occurs. On the other hand, when you run it on the random data, almost immediately you start to see multiple rebalances happening on every insert, as many as 5 or 6 of them in the smallest case (50 inserts), because it's happening on subtrees as well.

With priority checking turned back on, not only are rebalances typically less expensive due to more nodes being pushed into the left subtree (where they never come out of because no insertions happen there), but they are also less likely to occur. Why? Because in the treap, high-priority nodes float to the top, and the constant left-rotations (not accompanied by right-rotations) start to push all the high-priority nodes into the left subtree as well. The result is that rebalances happen less frequently due to the uneven distribution of probability.

If you instrument the rebalancing code you'll see that this is true; for both the sorted and random input, you end up with almost identical numbers of left-rotations, but the random input also gives the same number of right-rotations, which makes for twice as many in all. This shouldn't be surprising - Gaussian input should result in a Gaussian distribution of rotations. You'll also see that there are only about 60-70% as many top-level rebalances for the sorted input, which perhaps is surprising, and again, that's due to the sorted input messing with the natural distribution of priorities.

You can also verify this by inspecting the full tree at the end of an insertion loop. With the random input, priorities tend to decrease fairly linearly by level; with the sorted input, priorities tend to stay very high until you get to one or two levels from the bottom.

Hopefully I've done a decent job explaining this... let me know if any of it is too vague.

like image 28
Aaronaught Avatar answered Sep 19 '22 20:09

Aaronaught


I ran your code, and I think it has to do with the number of rotations. During ordered input, the number of rotations are optimal, and the tree will never have to rotate back.

During random input the tree will have to perform more rotations, because it may have to rotate back and forth.

To really find out, I would have to add counters for the numbers of left and right rotations for each run. You can probably do this yourself.

UPDATE:

I put breakpoints on rotateleft and rotateright. During ordered input rotateright is never used. During random input, both are hit, and it seems to me that they are used more frequently.

UPDATE 2:

I added some output to the 50 item ordered run (substituting with integers for clarity), to learn more:

TimeIt(50, OrderedInsert)
LastValue = 0, Top.Value = 0, Right.Count = 0, Left.Count = 0
RotateLeft @value=0
LastValue = 1, Top.Value = 1, Right.Count = 0, Left.Count = 1
LastValue = 2, Top.Value = 1, Right.Count = 1, Left.Count = 1
LastValue = 3, Top.Value = 1, Right.Count = 2, Left.Count = 1
RotateLeft @value=3
RotateLeft @value=2
RotateLeft @value=1
LastValue = 4, Top.Value = 4, Right.Count = 0, Left.Count = 4
LastValue = 5, Top.Value = 4, Right.Count = 1, Left.Count = 4
LastValue = 6, Top.Value = 4, Right.Count = 2, Left.Count = 4
RotateLeft @value=6
LastValue = 7, Top.Value = 4, Right.Count = 3, Left.Count = 4
LastValue = 8, Top.Value = 4, Right.Count = 4, Left.Count = 4
RotateLeft @value=8
RotateLeft @value=7
LastValue = 9, Top.Value = 4, Right.Count = 5, Left.Count = 4
LastValue = 10, Top.Value = 4, Right.Count = 6, Left.Count = 4
RotateLeft @value=10
RotateLeft @value=9
RotateLeft @value=5
RotateLeft @value=4
LastValue = 11, Top.Value = 11, Right.Count = 0, Left.Count = 11
LastValue = 12, Top.Value = 11, Right.Count = 1, Left.Count = 11
RotateLeft @value=12
LastValue = 13, Top.Value = 11, Right.Count = 2, Left.Count = 11
RotateLeft @value=13
LastValue = 14, Top.Value = 11, Right.Count = 3, Left.Count = 11
LastValue = 15, Top.Value = 11, Right.Count = 4, Left.Count = 11
RotateLeft @value=15
RotateLeft @value=14
LastValue = 16, Top.Value = 11, Right.Count = 5, Left.Count = 11
LastValue = 17, Top.Value = 11, Right.Count = 6, Left.Count = 11
RotateLeft @value=17
LastValue = 18, Top.Value = 11, Right.Count = 7, Left.Count = 11
LastValue = 19, Top.Value = 11, Right.Count = 8, Left.Count = 11
RotateLeft @value=19
LastValue = 20, Top.Value = 11, Right.Count = 9, Left.Count = 11
LastValue = 21, Top.Value = 11, Right.Count = 10, Left.Count = 11
RotateLeft @value=21
LastValue = 22, Top.Value = 11, Right.Count = 11, Left.Count = 11
RotateLeft @value=22
RotateLeft @value=20
RotateLeft @value=18
LastValue = 23, Top.Value = 11, Right.Count = 12, Left.Count = 11
LastValue = 24, Top.Value = 11, Right.Count = 13, Left.Count = 11
LastValue = 25, Top.Value = 11, Right.Count = 14, Left.Count = 11
RotateLeft @value=25
RotateLeft @value=24
LastValue = 26, Top.Value = 11, Right.Count = 15, Left.Count = 11
LastValue = 27, Top.Value = 11, Right.Count = 16, Left.Count = 11
RotateLeft @value=27
LastValue = 28, Top.Value = 11, Right.Count = 17, Left.Count = 11
RotateLeft @value=28
RotateLeft @value=26
RotateLeft @value=23
RotateLeft @value=16
RotateLeft @value=11
LastValue = 29, Top.Value = 29, Right.Count = 0, Left.Count = 29
LastValue = 30, Top.Value = 29, Right.Count = 1, Left.Count = 29
LastValue = 31, Top.Value = 29, Right.Count = 2, Left.Count = 29
LastValue = 32, Top.Value = 29, Right.Count = 3, Left.Count = 29
RotateLeft @value=32
RotateLeft @value=31
LastValue = 33, Top.Value = 29, Right.Count = 4, Left.Count = 29
RotateLeft @value=33
RotateLeft @value=30
LastValue = 34, Top.Value = 29, Right.Count = 5, Left.Count = 29
RotateLeft @value=34
LastValue = 35, Top.Value = 29, Right.Count = 6, Left.Count = 29
LastValue = 36, Top.Value = 29, Right.Count = 7, Left.Count = 29
LastValue = 37, Top.Value = 29, Right.Count = 8, Left.Count = 29
RotateLeft @value=37
LastValue = 38, Top.Value = 29, Right.Count = 9, Left.Count = 29
LastValue = 39, Top.Value = 29, Right.Count = 10, Left.Count = 29
RotateLeft @value=39
LastValue = 40, Top.Value = 29, Right.Count = 11, Left.Count = 29
RotateLeft @value=40
RotateLeft @value=38
RotateLeft @value=36
LastValue = 41, Top.Value = 29, Right.Count = 12, Left.Count = 29
LastValue = 42, Top.Value = 29, Right.Count = 13, Left.Count = 29
RotateLeft @value=42
LastValue = 43, Top.Value = 29, Right.Count = 14, Left.Count = 29
LastValue = 44, Top.Value = 29, Right.Count = 15, Left.Count = 29
RotateLeft @value=44
LastValue = 45, Top.Value = 29, Right.Count = 16, Left.Count = 29
LastValue = 46, Top.Value = 29, Right.Count = 17, Left.Count = 29
RotateLeft @value=46
RotateLeft @value=45
LastValue = 47, Top.Value = 29, Right.Count = 18, Left.Count = 29
LastValue = 48, Top.Value = 29, Right.Count = 19, Left.Count = 29
LastValue = 49, Top.Value = 29, Right.Count = 20, Left.Count = 29

The ordered items always gets added to the right side of the tree, naturally. When the right side gets bigger than the left, a rotateleft happens. Rotateright never happens. A new top node is selected roughly every time the tree doubles. The randomness of the priority value jitters it a little, so it goes 0, 1, 4, 11, 29 in this run.

A random run reveals something interesting:

TimeIt(50, RandomInsert)
LastValue = 0,748661640914465, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 0
LastValue = 0,669427539533669, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 1
RotateRight @value=0,669427539533669
LastValue = 0,318363281115127, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 2
RotateRight @value=0,669427539533669
LastValue = 0,33133987678743, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 3
RotateLeft @value=0,748661640914465
LastValue = 0,955126694382693, Top.Value = 0,955126694382693, Right.Count = 0, Left.Count = 4
RotateRight @value=0,669427539533669
RotateLeft @value=0,33133987678743
RotateLeft @value=0,318363281115127
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
LastValue = 0,641024029180884, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 2
LastValue = 0,20709771951991, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 3
LastValue = 0,830862050331599, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 3
RotateRight @value=0,20709771951991
RotateRight @value=0,318363281115127
LastValue = 0,203250563798123, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 4
RotateLeft @value=0,669427539533669
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
LastValue = 0,701743399585478, Top.Value = 0,641024029180884, Right.Count = 5, Left.Count = 4
RotateLeft @value=0,669427539533669
RotateRight @value=0,701743399585478
RotateLeft @value=0,641024029180884
LastValue = 0,675667521858433, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 6
RotateLeft @value=0,33133987678743
RotateLeft @value=0,318363281115127
RotateLeft @value=0,203250563798123
LastValue = 0,531275219531392, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 7
RotateRight @value=0,748661640914465
RotateRight @value=0,955126694382693
RotateLeft @value=0,701743399585478
LastValue = 0,704049674190604, Top.Value = 0,675667521858433, Right.Count = 5, Left.Count = 7
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
LastValue = 0,161392807104342, Top.Value = 0,161392807104342, Right.Count = 13, Left.Count = 0
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateLeft @value=0,161392807104342
LastValue = 0,167598206162266, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 1
LastValue = 0,154996359793002, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 2
RotateLeft @value=0,33133987678743
LastValue = 0,431767346538495, Top.Value = 0,167598206162266, Right.Count = 14, Left.Count = 2
RotateRight @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateLeft @value=0,167598206162266
LastValue = 0,173774613614089, Top.Value = 0,173774613614089, Right.Count = 14, Left.Count = 3
RotateRight @value=0,830862050331599
LastValue = 0,76559642412029, Top.Value = 0,173774613614089, Right.Count = 15, Left.Count = 3
RotateRight @value=0,76559642412029
RotateLeft @value=0,748661640914465
RotateRight @value=0,955126694382693
RotateLeft @value=0,704049674190604
RotateLeft @value=0,675667521858433
LastValue = 0,75742144871383, Top.Value = 0,173774613614089, Right.Count = 16, Left.Count = 3
LastValue = 0,346844367844446, Top.Value = 0,173774613614089, Right.Count = 17, Left.Count = 3
RotateRight @value=0,830862050331599
LastValue = 0,787565814232251, Top.Value = 0,173774613614089, Right.Count = 18, Left.Count = 3
LastValue = 0,734950566540915, Top.Value = 0,173774613614089, Right.Count = 19, Left.Count = 3
RotateLeft @value=0,20709771951991
RotateRight @value=0,318363281115127
RotateLeft @value=0,203250563798123
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
RotateLeft @value=0,173774613614089
LastValue = 0,236504829598826, Top.Value = 0,236504829598826, Right.Count = 17, Left.Count = 6
RotateLeft @value=0,830862050331599
RotateLeft @value=0,787565814232251
RotateLeft @value=0,76559642412029
RotateRight @value=0,955126694382693
LastValue = 0,895606500048007, Top.Value = 0,236504829598826, Right.Count = 18, Left.Count = 6
LastValue = 0,599106418713511, Top.Value = 0,236504829598826, Right.Count = 19, Left.Count = 6
LastValue = 0,8182332901369, Top.Value = 0,236504829598826, Right.Count = 20, Left.Count = 6
RotateRight @value=0,734950566540915
LastValue = 0,704216948572647, Top.Value = 0,236504829598826, Right.Count = 21, Left.Count = 6
RotateLeft @value=0,346844367844446
RotateLeft @value=0,33133987678743
RotateRight @value=0,431767346538495
RotateLeft @value=0,318363281115127
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
LastValue = 0,379157059536854, Top.Value = 0,236504829598826, Right.Count = 22, Left.Count = 6
RotateLeft @value=0,431767346538495
LastValue = 0,46832062046431, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 6
RotateRight @value=0,154996359793002
LastValue = 0,0999000217299443, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 7
RotateLeft @value=0,20709771951991
LastValue = 0,229543754006524, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 8
RotateRight @value=0,8182332901369
LastValue = 0,80358425984326, Top.Value = 0,236504829598826, Right.Count = 24, Left.Count = 8
RotateRight @value=0,318363281115127
LastValue = 0,259324726769386, Top.Value = 0,236504829598826, Right.Count = 25, Left.Count = 8
RotateRight @value=0,318363281115127
LastValue = 0,307835293145774, Top.Value = 0,236504829598826, Right.Count = 26, Left.Count = 8
RotateLeft @value=0,431767346538495
LastValue = 0,453910283024381, Top.Value = 0,236504829598826, Right.Count = 27, Left.Count = 8
RotateLeft @value=0,830862050331599
LastValue = 0,868997387527021, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 8
RotateLeft @value=0,20709771951991
RotateRight @value=0,229543754006524
RotateLeft @value=0,203250563798123
LastValue = 0,218358597354199, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 9
RotateRight @value=0,0999000217299443
RotateRight @value=0,161392807104342
LastValue = 0,0642934488431986, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 10
RotateRight @value=0,154996359793002
RotateLeft @value=0,0999000217299443
LastValue = 0,148295871982489, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 11
LastValue = 0,217621828065078, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 12
RotateRight @value=0,599106418713511
LastValue = 0,553135806020878, Top.Value = 0,236504829598826, Right.Count = 29, Left.Count = 12
LastValue = 0,982277666210326, Top.Value = 0,236504829598826, Right.Count = 30, Left.Count = 12
RotateRight @value=0,8182332901369
LastValue = 0,803671114520948, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 12
RotateRight @value=0,203250563798123
RotateRight @value=0,218358597354199
LastValue = 0,19310415405459, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 13
LastValue = 0,0133136604043253, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 14
RotateLeft @value=0,46832062046431
RotateRight @value=0,531275219531392
RotateRight @value=0,641024029180884
RotateRight @value=0,675667521858433
RotateRight @value=0,75742144871383
LastValue = 0,483394719419719, Top.Value = 0,236504829598826, Right.Count = 32, Left.Count = 14
RotateLeft @value=0,431767346538495
RotateRight @value=0,453910283024381
LastValue = 0,453370328738061, Top.Value = 0,236504829598826, Right.Count = 33, Left.Count = 14
LastValue = 0,762330518459124, Top.Value = 0,236504829598826, Right.Count = 34, Left.Count = 14
LastValue = 0,699010426969738, Top.Value = 0,236504829598826, Right.Count = 35, Left.Count = 14

Rotations happen not so much because the tree is unbalanced, but because of the priorities, which are randomly selected. For example we get 4 rotations at the 13th insertion. We have a tree balanced at 5/7 (which is fine), but get to 13/0! It would seem that the use of random priorities deserves further investigation. Anyhow, it is plain to see that the random inserts cause a lot more rotations, than the ordered inserts.

like image 52
Guge Avatar answered Sep 18 '22 20:09

Guge