Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor/show progress during a C++ sort

I'm planning to write an interactive C++ geometry processing plug-in that will frequently sort large amounts of data. Although preliminary indications are that the sort will take only a second or two, I'd prefer to show progress during that time - i.e I'd like to update a progress indicator a few times per second. This would be preferable to turning on a wait cursor and leaving the user with a program that freezes for an indeterminate length of time (even if that's just a few seconds).

If I were to use something like std::sort, I could use the comparison function to update the progress indicator every now and then, but I'd have no idea of 'percentage complete'. I could also break the sort down into sub-sorts, updating progress between sub-sorts, and then merging. My best bet may be to write own sort method, although I don't know how much effort it would require to get performance as good as std::sort (and ensure correctness). In any case, that sort method would occasionally send a "percentage complete" to a callback method.

I was wondering whether other people have encountered and solved this problem - I'm hoping that maybe there's a sort method in a standard library that does what I want, or some other technique that I haven't thought of.

Update: Thanks for the great answers so far. There have been a few very good suggestions, and I'm going to hold off on choosing the accepted answer until I've had a chance to test out the ideas in my upcoming project.

Update 2: I completed my project, and this turned out to be a non-issue (at least for the customer. Since they will be selling the software, they may still get feedback from their customers that will change their minds about this). Choosing an accepted answer was difficult because there were many good answers, but in the end the one I chose pointed to a wiki article on Merge Sort that had a very evocative animation. So that's the first strategy I would have pursued had I needed to go ahead with this).

like image 869
brainjam Avatar asked Jun 22 '10 16:06

brainjam


2 Answers

I think, even if you wrote your own sort, that you would have to do a lot of careful measurement if you wanted the progress indicator to be accurate. If you only want an approximate progress indicator, then you can use some metric like 'average distance between compared elements' or 'number of comparisons as compared to average expected number for quicksort' as your metric and implement the comparison idea you already mentioned.

And yes, I assume that you aren't a complete idiot and do not plan on updating the progress indicator at each and every comparison. If you did that you'd spend much more time indicating progress than sorting.

As an example, you would generally expect about n log2 n operations for quicksort. The analysis of how many comparisons are involved is more detailed and can be more accurate than that general measure, but for the purposes of this example, lets just assume. So you could count comparisons and report number_of_comparisons / (n log2 n) as your estimate of progress.

Since that's just an average indicator I would run a few experiments and see how far your estimate is off, and throw in some fudge factors to make it line up with the average expected case. You could also have a progress bar that indicated the uncertainty by having sort of "This is where I think I'll be done." indicator and some space after the indicator.

Even if you used your own sort and came up with a more seemingly precise measure, the progress bar still wouldn't update smoothly and the effect would be similar. The only way you know for sure how long your sort is going to take is if you use a somewhat slower, but really predictable sort, in which case you can predict how long it will take from the number of elements, or use a really fast sort that has less predictable behavior in specific cases, in which case there is no real way to have a perfectly accurate progress bar.

Predictability of subtasks and predictability of total number of comparisons are strongly linked. So I really don't think that subtasks make a better measure than total number of comparisons.

If you want to use your own sort and predictability is your highest goal, go for heapsort. It's still an O(n log2 n) sort, and it's close to being a minimum comparison sort (or so I remember from reading Knuth). It also takes a very predictable amount of time to complete regardless of the dataset its fed. It's one of the slower O(n log2 n) sorts, but still.

As one of your commenters mentioned though, you might be solving a problem that doesn't actually exist. Run some experiments first. The problem is a fun intellectual challenge regardless of its usefulness though. :-)

like image 122
Omnifarious Avatar answered Oct 08 '22 07:10

Omnifarious


Since std::sort is template based, the source should be available in a header. You can make a copy of it and insert your progress callback. The big problem will be predicting how close you are to completion - most sort functions will be based on Quicksort, which doesn't always do the same number of comparisons.

Writing your own Merge sort would be a possibility; the algorithm is easy and the number of steps are well defined.

like image 24
Mark Ransom Avatar answered Oct 08 '22 08:10

Mark Ransom