Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving treap implementation

Here's my implementation of a sort of treap (with implicit keys and some additional information stored in nodes): http://hpaste.org/42839/treap_with_implicit_keys

According to profiling data GC takes 80% of time for this program. As far as I understand, it's caused by the fact that every time a node is 'modified', each node on the path to the root is recreated.

Is there something I can do here to improve performance or I have to descend into the realm of ST monad?

like image 243
adamax Avatar asked Jan 07 '11 15:01

adamax


People also ask

What is the purpose of a treap?

A treap is a height balanced binary tree. It is used to store a sequence in a tree, which allows for various applications like searching. A Cartesian tree, in case of a sorted sequence, would be basically a linked list, making tree virtually useless.

What is treap in data structure?

Treap data structure is a hybrid of a binary search tree and a heap. The treap and the randomized binary search tree are two binary search tree data structures that keep a dynamic set of ordered keys and allow binary searches among the keys.

What is the reason behind the simplicity of a treap?

What is the reason behind the simplicity of a treap? Explanation: A treap is the simplest of all because we don't have to worry about adjusting the priority of a node.


1 Answers

Using GHC 7.0.3, I can reproduce your heavy GC behavior:

  $ time ./A +RTS -s
  %GC time      92.9%  (92.9% elapsed)
  ./A +RTS -s  7.24s user 0.04s system 99% cpu 7.301 total

I spent 10 minutes going through the program. Here's what I did, in order:

  • Set GHC's -H flag, increasing limits in the GC
  • Check unpacking
  • Improve inlining
  • Adjust the first generation allocation area

Resulting in a 10 fold speedup, and GC around 45% of time.


In order, using GHC's magic -H flag, we can reduce that runtime quite a bit:

  $ time ./A +RTS -s -H
  %GC time      74.3%  (75.3% elapsed)
  ./A +RTS -s -H  2.34s user 0.04s system 99% cpu 2.392 total

Not bad!

The UNPACK pragmas on the Tree nodes won't do anything, so remove those.

Inlining update shaves off more runtime:

 ./A +RTS -s -H  1.84s user 0.04s system 99% cpu 1.883 total

as does inlining height

 ./A +RTS -s -H  1.74s user 0.03s system 99% cpu 1.777 total

So while it is fast, GC is still dominating -- since we're testing allocation, after all. One thing we can do is increase the first gen size:

 $ time ./A +RTS -s -A200M
 %GC time      45.1%  (40.5% elapsed)
 ./A +RTS -s -A200M  0.71s user 0.16s system 99% cpu 0.872 total

And increasing the unfolding threshold, as JohnL suggested, helps a little,

 ./A +RTS -s -A100M  0.74s user 0.09s system 99% cpu 0.826 total

which is what, 10x faster than we started? Not bad.


Using ghc-gc-tune, you can see runtime as a function of -A and -H,

Time and GC

Interestingly, the best running times use very large -A values, e.g.

$ time ./A +RTS -A500M   
./A +RTS -A500M  0.49s user 0.28s system 99% cpu 0.776s
like image 145
Don Stewart Avatar answered Oct 29 '22 06:10

Don Stewart