Parallel fill std::vector with zero

1 Answers

You can split the vector into chunks for each thread to be filled with std::fill:

#pragma omp parallel
{   
    auto tid = omp_get_thread_num();
    auto chunksize = v.size() / omp_get_num_threads();
    auto begin = v.begin() + chunksize * tid;
    auto end = (tid == omp_get_num_threads() -1) ? v.end() : begin + chunksize);
    std::fill(begin, end, 0);
}

You can further improve it by rounding chunksize to the nearest cacheline / memory word size (128 byte = 32 ints). Assuming that v.data() is aligned similarly. That way, you avoid any false sharing issues.

On a dual socket 24 core Haswell system, I get a speedup of somewhere near 9x: 3.6s for 1 thread, to 0.4s for 24 threads, 4.8B ints = ~48 GB/s, the results vary a bit and this is not a scientific analysis. But it is not too far off the memory bandwidth of the system.

For general performance, you should be concerned about dividing your vector not only for this operation, but also for further operations (be it read or write) the same way if possible. That way, you increase the chance that the data is actually in cache if you need it, or at least on the same NUMA node.

Oddly enough, on my system std::fill(..., 1); is faster than std::fill(..., 0) for a single thread, but slower for 24 threads. Both with gcc 6.1.0 and icc 17.0.1. I guess I'll post that into a separate question.

129

answered Oct 27 '22 22:10

Zulan

Related questions
                            
                                Static order initialization fiasco, iostream and C++11
                            
                                Difference in C++11 async behaviour on Mac and Linux
                            
                                Why move return an rvalue reference parameter need to wrap it with std::move()?
                            
                                Type trait to identify types that can be read/written in binary form
                            
                                deep neural network's precision for image recognition, float or double?
                            
                                SFINAE remove function from overload set if a free function does / does not exist
                            
                                c++ atomic: would function call act as memory barrier?
                            
                                C++ - how to copy elements from std::priority_queue to std::vector
                            
                                Safely check if `this` is null
                            
                                Shouldn't a compiler raise a warning for member variables of base struct shadowed in derived class(es)?
                            
                                Is a virtual function of a template class implicitly instantiated?
                            
                                Issue with Newton binomial coefficient in c++
                            
                                std::is_nothrow_constructible when constructor is inherited
                            
                                Prevent a returning function from execution if a condition on parameters is true
                            
                                While installing on OSX Sierra via gcc-6, keep having "FATAL:/opt/local/bin/../libexec/as/x86_64/as: I don't understand 'm' flag!" error
                            
                                Why are C++11 override and final not attributes?
                            
                                Invalidate all shared ptrs toward a specific managed object
                            
                                Does Using a Pointer as a Container Iterator Violate the Standard
                            
                                How to implement a cache friendly dynamic binary tree?
                            
                                Why does Stroustrup in "The C++ Programming Language 3rd Edition (online)" define the function sqrt? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallel fill std::vector with zero

Tags:

c++

parallel-processing

vector

openmp

hamster on wheels

People also ask

1 Answers

Zulan

Recent Activity

Donate For Us