How to structure a C++ application to use a multicore processor

Tags:

I am building an application that will do some object tracking from a video camera feed and use information from that to run a particle system in OpenGL. The code to process the video feed is somewhat slow, 200 - 300 milliseconds per frame right now. The system that this will be running on has a dual core processor. To maximize performance I want to offload the camera processing stuff to one processor and just communicate relevant data back to the main application as it is available, while leaving the main application kicking on the other processor.

What do I need to do to offload the camera work to the other processor and how do I handle communication with the main application?

Edit: I am running Windows 7 64-bit.

263

asked Jan 30 '10 01:01

Mr Bell

2 Answers

Basically, you need to multithread your application. Each thread of execution can only saturate one core. Separate threads tend to be run on separate cores. If you are insistent that each thread ALWAYS execute on a specific core, then each operating system has its own way of specifying this (affinity masks & such)... but I wouldn't recommend it.

OpenMP is great, but it's a tad fat in the ass, especially when joining back up from a parallelization. YMMV. It's easy to use, but not at all the best performing option. It also requires compiler support.

If you're on Mac OS X 10.6 (Snow Leopard), you can use Grand Central Dispatch. It's interesting to read about, even if you don't use it, as its design implements some best practices. It also isn't optimal, but it's better than OpenMP, even though it also requires compiler support.

If you can wrap your head around breaking up your application into "tasks" or "jobs," you can shove these jobs down as many pipes as you have cores. Think of batching your processing as atomic units of work. If you can segment it properly, you can run your camera processing on both cores, and your main thread at the same time.

If communication is minimized for each unit of work, then your need for mutexes and other locking primitives will be minimized. Course grained threading is much easier than fine grained. And, you can always use a library or framework to ease the burden. Consider Boost's Thread library if you take the manual approach. It provides portable wrappers and a nice abstraction.

answered Sep 20 '22 06:09

pestilence669

It depends on how many cores you have. If you have only 2 cores (cpu, processors, hyperthreads, you know what i mean), then OpenMP cannot give such a tremendous increase in performance, but will help. The maximum gain you can have is divide your time by the number of processors so it will still take 100 - 150 ms per frame.

The equation is
parallel time = (([total time to perform a task] - [code that cannot be parallelized]) / [number of cpus]) + [code that cannot be parallelized]

Basically, OpenMP rocks at parallel loops processing. Its rather easy to use

#pragma omp parallel for
for (i = 0; i < N; i++)
    a[i] = 2 * i;

and bang, your for is parallelized. It does not work for every case, not every algorithm can be parallelized this way but many can be rewritten (hacked) to be compatible. The key principle is Single Instruction, Multiple Data (SIMD), applying the same convolution code to multiple pixels for example.

But simply applying this cookbook receipe goes against the rules of optimization.
1-Benchmark your code
2-Find the REAL bottlenecks with "scientific" evidence (numbers) instead of simply guessing where you think there is a bottleneck
3-If it is really processing loops, then OpenMP is for you

Maybe simple optimizations on your existing code can give better results, who knows?

Another road would be to run opengl in a thread and data processing on another thread. This will help a lot if opengl or your particle rendering system takes a lot of power, but remember that threading can lead to other kind of synchronization bottlenecks.

answered Sep 20 '22 06:09

Eric

Related questions
                            
                                Create file with filesystem C++ library
                            
                                What's the function signature of a member function?
                            
                                Lazy evaluation in C++14/17 - just lambdas or also futures etc.?
                            
                                Function hiding and using-declaration in C++
                            
                                std::vector resize(0) or clear() - but keep it's capacity
                            
                                What is type punning and what is the purpose of it?
                            
                                Does std::vector::reserve guarantee that the implementation will not invalidate iterators in this case?
                            
                                Why does the disjunction assignment operator |= not work with vectors of bools?
                            
                                Factoring out repeated constructor calls in template-builder pattern
                            
                                [[maybe_unused]] and Constructors
                            
                                Figuring out the constness of an object within its destructor
                            
                                TensorFlow Lite C++ API example for inference
                            
                                Legitimate to initialize an array in a constexpr constructor?
                            
                                How to create stream which handles both input and output in C++?
                            
                                Can you have two classes with the same name and the same member function in different translation units?
                            
                                Are there any examples where we *need* protected inheritance in C++?
                            
                                Clearing/resetting a model in qt (removing all rows)
                            
                                QFileDialog: adding extension automatically when saving file?
                            
                                boost asio io_service.run()
                            
                                Read from file, clear it, write to it

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With