Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any papers that explore performance issues and optimizations strategies available to C++ based COM applications?

Caveat: I'm not sure if this can be deemed as a proper SO programming question!

I ran into severe performance penalties while working with the MS Office Suite due mainly to the millions of COM calls that I make to process documents. Part of the problem was fixed by using the OOXML SDK instead of using the native application's API. However, the OOXML SDK itself makes COM calls and this is slowing things down (yes, I have duly run both Visual Studio's in-built performance analyzer and BoundsChecker and made sure that the algorithms are the best we can use throughout). I figured a layer of caching speeds things up (sometimes reducing the execution time by one-fourth) quite a bit (but obviously, the speed-up varies based on my access pattern which in turn is governed by the document's content structure).

Given the fact that both COM and C++ has been around for so long, I am surprised to see that there is so little material on optimizing C++ based COM applications. (A quick search on Google should suffice to prove my point, though I wouldn't mind being proven wrong!)

  • So, it would be great if you guys can help me dig out a few relevant papers from the dredges of the internet.
  • Also, (since my work is so obvious) is it still worth writing up my experience as a paper?

Edit: Clarification: I'm not really looking for an alternative (since it is too late to change the underlying). I'm interested in reading up on similar problems people may have faced in the past and how they worked around the limitations.

like image 414
dirkgently Avatar asked Jun 16 '11 20:06

dirkgently


2 Answers

It's highly likely that C++ is not to blame - it's likely something like marshalling that kicks in and consumes most of time. Don't forget that you will have marshalling for in-proc servers as well - in case the consumer and the server threading models are incompatible. Also you can spend lot of time on synchronization in certain cases.

Getting rid of or optimizing marshalling (there's such thing like "free-threaded marshaller" that I myself don't get, but looks promising in terms of improving performance) will give you a huge boost - every call will go directly instead of a ton of wiring. Again, tuning synchronization (making it fine grained and minimal) can also improve performance.

We once had severe performance problem in an STA component - calls from different consumer threads would go though a proxy and serialize. Since each call would block for a long period of time (waiting for a backend to perform complex data processing) all other threads were just hanging there waiting for their turn - the server would serve one request at a time. We redesigned the call - it would now just "post" the request and a COM event would fire once processing is complete. This solved the problem - now "waiting" was moved outside the call, so COM synchronization would not block all threads for too long and inhibit parallelism. This is not something specific to any language - just how COM concurrency works. You find such issues by carefully logging all calls and reviewing the logs.

As you ask about the C++ part you can of course profile - C++ code can be profiled with great detail. IMO it's not likely you find something worth attention, but again you don't know until you profile - maybe there's something really dumb in your code. One thing that can be optimized is minimizing thread-safety to the level just enough for your threading model.

like image 119
sharptooth Avatar answered Oct 29 '22 21:10

sharptooth


COM is language/platform independent, by design.
So looking for C++ specific optimization methods is a little out of context. For COM platform and COM servers, a C++ client is just one of clients, only running in more optimized machine code.

COM is protocol/architecture for server-client interaction/communication.
So minimizing server access itself will be more important than optimizing server access operation.

On the other hand, some COM servers provide low level interfaces only available to C/C++ clients. IE WebBrowser control is the best example, I think. For these COM servers, using C++ could yield big performance improvement. But AFAIK MS Office suite don't provide such low level interfaces.

That is, it is very likely that whether you build your MS Office suite access module in C++, C#, or VB6, COM specific processing cost (calling COM server interface methods and receiving results) could be measured as the same.

I think, C++ clients have more optimization options in not-related-to-COM area and this should be the key point in optimization approach (such as introducing local cache backup as you already did).

like image 33
9dan Avatar answered Oct 29 '22 19:10

9dan