Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some resources I can use to learn profiling/optimizing?

I just inherited a C# project that runs way to slow and will have to start optimizing it. What I wanted to do first is learn a little more about profiling/optimizing since I didnt have to do it before. So the question is where do I start, what books/blogs/articels can I read?

I do know OF the .net profilers like ANTS profiler and so on, but I have no idea how to use them efficiently. I have not really used it, just let it run on a few sample apps to play around with the output.

like image 529
LDomagala Avatar asked Feb 15 '09 01:02

LDomagala


People also ask

What are code profiling tools?

Code profiling tools allow you to analyze the performance of your code by measuring the time it takes your methods to run and the amount of CPU and memory they consume.

What is a performance profiling tool?

Performance profilers are software development tools designed to help you analyze the performance of your applications and improve poorly performing sections of code.


1 Answers

There are two steps to optimizing code.

First, you need to find out what's slow. That's profiling, and, as you might guess, a profiler is commonly used for this. Most profilers are generally straightforward to use. You run your application through a profiler, and when it terminates, the profiler will show you how much time was spent in each function, exclusive (this function without counting time spent in function called from that) as well as inclusive (time spent in this function, including child function calls).

In other words, you get a big call tree, and you just have to hunt down the big numbers. Usually, you have very few functions consuming more than 10% of the execution time. So locate these and you know what to optimize.

Note that a profiler is neither necessary nor, necessarily, the best approach. A remarkably simple, but effective, approach is to just run the program in a debugger, and, at a few quasi-random times, pause execution and look at the call stack. Do this just a couple of times, and you have a very good idea of where your execution time is being spent. @Mike Dunlavey who commented under this answer has described this approach in depth elsewhere.

But now that you know where the execution time is being spent, then comes the tricky part, how to optimize the code.

Of course, the most effective approach is often the high-level one. Does the problem have to be solved in this way? Does it have to be solved at all? Could it have been solved in advance and the result cached so it could be delivered instantly when the rest of the app needed it? Are there more efficient algorithms for solving the problem?

If you can apply such high-level optimizations, do that, see if that improved performance sufficiently, and if not, profile again.

Sooner or later, you may have to dive into more low-level optimizations. This is tricky territory though. Today's computers are pretty complex, and the performance you get from them is not straightforward. The cost of a branch or a function call can vary widely depending on the context. Adding two numbers together may take anywhere from 0 to 100 clock cycles depending on whether both values were already in the CPU's registers, what else is being executed at the time, and a number of other factors. So optimization at this level requires (1) a good understanding of how the CPU works, and (2) lots of experimentation and measurements. You can easily make a change that you think will be faster, but you need to be sure, so measure the performance before and after the change.

There are a few general rules of thumb that can often help guide optimizations:

I/O is expensive. CPU instructions are measured in fractions of a nanosecond. RAM access is in the order of tens to hundreds of nanoseconds. A harddrive access may take tens of milliseconds. So often, I/O will be what's slowing down your application. Does your application perform few large I/O reads (read a 20MB file in one big chunk), or countless small ones (read bytes 2,052 to 2073 from one file, then read a couple of bytes from another file)? Fewer large reads can speed your I/O up by a factor of several thousand.

Pagefaults involve harddrive accesses too. In-memory pages have to be pushed to the pagefile, and paged-out ones have to be read back into memory. If this happens a lot, it's going to be slow. can you improve the locality of your data so fewer pages will be needed at the same time? Can you simply buy more RAM for the host computer to avoid having to page data out? (As a general rule, hardware is cheap. Upgrading the computer is a perfectly valid optimization - but make sure the upgrade will make a difference. Disk reads won't be a lot faster by buying a faster computer. And if everything fits into RAM on your old system, there's no point in buying one with 8 times as much RAM)

Your database relies on harddrive accesses too. So can you get away with caching more data in RAM, and only occasionally writing it out to the database? (Of course there's a risk there. What happens if the application crashes?

And then there's everyone favorite, threading. A modern CPU has anywhere from 2 to 16 CPU cores available. Are you using them all? Would you benefit from using them? Are there long-running operations that can be executed asynchronously? The application starts the operation in a separate thread, and is then able to resume normal operation instantly, rather than blocking until the operation is complete.

So basically, use the profiler to understand your application. How does it spend its execution time, where is it being spent? Is memory consumption a problem? What are the I/O patterns (both harddrive and network accesses, as well as any other kind of I/O)? Is the CPU just churning away all the time, or is it idle waiting for some external events, such as I/O or timers?

And then understand as much as possible about the computer it's running on. Understand what resources it has available (CPU cache, multiple cores), and what each of them means for performance.

This is all pretty vague, because the tricks to optimizing a big database server are going to be very different from what you'd do to optimize some big number-crunching algorithm.

like image 164
jalf Avatar answered Nov 15 '22 18:11

jalf