I would like to know if there is a good way to monitor my application internals, ideally in the form of an existing library.
My application is heavily multithreaded, and uses a messaging system to communicate in-between threads and to the external world. My goal is to monitor what kind of messages are sent, at which frequency, etc.
There could also be other statistics in a more general way, like how many threads are spawned every minute, how much new/delete are called, or more specific aspects of the application; you name it.
What would be awesome is something like the "internal pages" you have for Google Chrome, like net or chrome://tracing , but in a command line fashion.
If there is a library that's generic enough to accomodate for the specificities of my app, that would be great.
Otherwise I'm prepared to implement a small class that would do the job, but I don't know where to start. I think the most important thing is that the code shouldn't interfere too much, so that performances are not impacted.
Do you guys have some pointers on this matter?
Edit: my application runs on Linux, in an embedded environment, sadly not supported by Valgrind :(
I would recommend that in your code, you maintain counters that get incremented. The counters can be static
class members or globals. If you use a class to define your counter, you can have the constructor register your counter with a single repository along with a name. Then, you can query and reset your counters by consulting the repository.
struct Counter {
unsigned long c_;
unsigned long operator++ () { return ++c_; }
operator unsigned long () const { return c_; }
void reset () { unsigned long c = c_; ATOMIC_DECREMENT(c_, c); }
Counter (std::string name);
};
struct CounterAtomic : public Counter {
unsigned long operator++ () { return ATOMIC_INCREMENT(c_, 1); }
CounterAtomic (std::string name) : Counter(name) {}
};
ATOMIC_INCREMENT
would be a platform specific mechanism to increment the counter atomically. GCC provides a built-in __sync_add_and_fetch
for this purpose. ATOMIC_DECREMENT
is similar, with GCC built-in __sync_sub_and_fetch
.
struct CounterRepository {
typedef std::map<std::string, Counter *> MapType;
mutable Mutex lock_;
MapType map_;
void add (std::string n, Counter &c) {
ScopedLock<Mutex> sl(lock_);
if (map_.find(n) != map_.end()) throw n;
map_[n] = &c;
}
Counter & get (std::string n) const {
ScopedLock<Mutex> sl(lock_);
MapType::const_iterator i = map_.find(n);
if (i == map_.end()) throw n;
return *(i->second);
}
};
CounterRepository counterRepository;
Counter::Counter (std::string name) {
counterRepository.add(name, *this);
}
If you know the same counter will be incremented by more than one thread, then use CounterAtomic
. For counters that are specific to a thread, just use Counter
.
I gather you are trying to implement the gathering of run-time statistics -- things like how many bytes you sent, how long you've been running, and how many times the user has activated a particular function.
Typically, in order to compile run-time statistics such as these from a variety of sources (like worker threads), I would have each source (thread) increment its own, local counters of the most fundamental data but not perform any lengthy math or analysis on that data yet.
Then back in the main thread (or wherever you want these stats analyzed & displayed), I send a RequestProgress
type message to each of the worker threads. In response, the worker threads will gather up all the fundamental data and perhaps perform some simple analysis. This data, along with the results of the basic analysis, are sent back to the requesting (main) thread in a ProgressReport
message. The main thread then aggregates all this data, does additional (perhaps costly) analysis, formatting and display to the user or logging.
The main thread sends this RequestProgress
message either on user request (like when they press the S
key), or on a timed interval. If a timed interval is what I'm going for, I'll typically implement another new "heartbeat" thread. All this thread does is Sleep()
for a specified time, then send a Heartbeat
message to the main thread. The main thread in turn acts on this Heartbeat
message by sending RequestProgress
messages to every worker thread the statistics are to be gathered from.
The act of gathering statistics seems like it should be fairly straightforward. So why such a complex mechanism? The answer is two-fold.
First, the worker threads have a job to do, and computing usage statistics isn't it. Trying to refactor these threads to take on a second responsibility orthoganal to their main purpose is a little like trying to jam a square peg in to a round hole. They weren't built to do that, so the code will resist being written.
Second, the computation of run-time statistics can be costly if you try to do too much, too often. Suppose for example you have a worker thread that send multicast data on the network, and you want to gather throughput data. How many bytes, over how long a time period, and an average of how many bytes per second. You could have the worker thread compute all this on the fly itself, but it's a lot of work and that CPU time is better spent by the worker thread doing what it's supposed to be doing -- sending multicast data. If instead you simply incremented a counter for how many bytes you've sent every time you send a message, the counting has minimal impact on the performance of the thread. Then in response to the occasional RequestProgress
message you can figure out the start & stop times, and send just that along to let the main thread do all the divison etc.
Use shared memory (POSIX, System V, mmap or whatever you have available). Put a fixed length array of volatile unsigned 32- or 64-bit integers (i.e. the largest you can atomically increment on your platform) in there by casting the raw block of memory to your array definition. Note that the volatile doesn't get you atomicity; it prevents compiler optimizations that might trash your stats values. Use intrinsics like gcc's __sync_add_and_fetch() or the newer C++11 atomic<> types.
You can then write a small program that attaches to the same block of shared memory and can print out one or all stats. This small stats reader program and you main program would have to share a common header file that enforced the position of each stat in the array.
The obvious drawback here is that you're stuck with a fixed number of counters. But it's hard to beat, performance-wise. The impact is the atomic increment of an integer at various points in your program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With