Is there an advantage in choosing either loop as an outer loop?

Tags:

I am extending an existing logging library. It is a system with two sides: The frontend is where tasks write their log messages into, the backend is where an application can plug listeners into which forward those messages to different sinks. The backend used to be one hard-wired listener, I am now extending this for flexibility. The code is to be used exclusively on embedded devices, where high performance (measured in number of bytes forwarded per millisecond) is a very important design and implementation objective.

For performance reasons, messages are buffered, and forwarding is done in a background task. That task fetches a chunk of messages from a queue, formats them all, and then passes them to the listeners via registered functions. Those listeners will filter messages, and will only write those to their sink that pass the filter criterion.

Given this, I end up having N notification functions (the listeners) to send M messages to, a rather classic N*M problem. Now I have two possibilities: I can loop over the messages, and then loop over the notification functions passing the message to each one.

for(m in formatted_messages) 
  for(n in notification_functions)
    n(m);

void n(message)
{
    if( filter(message) )
      write(message);
}

Or I could loop over all the notification functions, and pass them all the messages I have at once:

for(n in notification_functions)
    n(formatted_messages);

void n(messages)
{
  for(m in messages)
    if( filter(m) )
      write(m);
}

Is there any fundamental considerations regarding which design is more likely to allow a higher number of messages to be processed per time slice? (Note how this question determines the listener's interface. This isn't a micro-optimization question, but one about how to make a design that does not hinder performance. I can measure only much later, and redesigning the listener interface then will be costly.)

Some considerations I have already made:

Those listeners need to write the messages somewhere, which is rather expensive, so the function calls by themselves might not be too important performance-wise.
In 95% of all cases, there will only be one listener.

686

asked Jun 27 '13 17:06

sbi

1 Answers

Is there any fundamental considerations regarding which design is more likely to allow a higher number of messages to be processed per time slice?

In general, the main considerations with this often boil down to two main things.

If one of your loops is looping over objects which can potentially have good memory locality (such as looping over an array of values), keeping that portion in the inner loop can potentially keep the objects within the CPU cache, and improve performance.
If you plan to try to parallelize the operation, keeping the "larger" (in terms of count) collection in the outer loop allows you to parallelize the outer loop effectively, and not cause over subscription of threads, etc. It's typically simpler and cleaner to parallelize an algorithm at the outer level, so designing the loops with the potentially larger parallel "blocks" of work at the outer loop can simplify this, if it's a possibility later.

Those listeners need to write the messages somewhere, which is rather expensive, so the function calls by themselves might not be too important performance-wise.

This will probably completely negate any benefits of moving one loop outside of the other.

In 95% of all cases, there will only be one listener.

If this is the case, I would likely put the listener loop at the outer scope, unless you plan to parallelize this operation. Given that this is going to run in a background thread on an embedded device, parallelizing is unlikely, so having the listener loop as the outer loop should reduce the overall instruction count (it effectively becomes a loop over M operations, instead of M loops over a single operation).

128

answered Oct 02 '22 15:10

Reed Copsey

Related questions
                            
                                When a float variable goes out of the float limits, what happens?
                            
                                x[0] == 1 constant expression in C++11 when x is const int[]?
                            
                                Is it legal C++ to declare main as extern "C"?
                            
                                Repeating Q_DISABLE_COPY in QObject derived classes
                            
                                Why can a static member function only be declared static inside the class definition and not also in its own definition?
                            
                                How to display pixels on screen directly from a raw array of RGB values faster than SetPixel()?
                            
                                Why not always use fpic (Position Independent Code)? [duplicate]
                            
                                Strange implicit conversions with the ternary operator
                            
                                Why does this rvalue reference bind to an lvalue?
                            
                                std::variant reflection. How can I tell which type of value std::variant is assigned?
                            
                                Profiling C++ in the presence of aggressive inlining?
                            
                                How to convert a dynamic dll to static lib?
                            
                                Is there an intelligent way to know the name of the library to link to at compile time? (Linux/Kubuntu)
                            
                                C++: type_info to distinguish types
                            
                                Remove an element from the middle of an std::heap
                            
                                C++ a singleton class with dll
                            
                                Set minimum version of boost in cmake
                            
                                What does "break when an exception is void" mean?
                            
                                Difference between pair of consts and const pair
                            
                                C++ unions vs. reinterpret_cast

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there an advantage in choosing either loop as an outer loop?

Tags:

c++

performance

loops

sbi

People also ask

1 Answers

Reed Copsey

Recent Activity

Donate For Us