Are push and pop operation for hashes and arrays atomic and thread-safe?

Tags:

I have a huge data file (close to 4T) that I need to crunch. I am using 4 threads on my 4-core CPU. First thread analyzes the first quarter of the file, and so on. All the threads need to add their results to the same single hash and single array after they have analyzed sections of their own quarter of the data file. So, is the "push" and "pop" and "shift" and "unshift" operations for hash and array atomic and thread-safe, or I have to resort to more complicated mechanisms like semaphores?

645

asked Apr 21 '20 22:04

lisprogtor

1 Answers

No, they are neither atomic nor threadsafe, and use from multiple threads will lead to crashes or data inconsistencies.

That said, even if they were, a design that involves lots of contention on the same data structure will scale poorly as you add more threads. This is because of the way hardware works in the face of parallelism; briefly:

Memory performance is heavily dependent on caches
Some cache levels are per CPU core
Writing to memory means getting it exclusively into the current core's cache
The process of moving it from one core's cache in order to write to it is costly (ballpack 60-100 cycle penalty)

You can use locking to attain correctness. For this, I don't recommend working with a lock directly, but instead look in to a module like OO::Monitors, where you can encapsulate the hash in an object and have locking done at the boundaries.

If the number of pushes you do on the shared data structure is low compared to the amount of work done to produce the items to push, then you might not bottleneck on the locking and contention around the data structure. If you are doing thousands of pushes or similar per second, however, I suggest looking for an alternative design. For example:

Break the work up into a part for each worker
Use start to set off each worker, which returns a Promise. Put the Promises into an array.
Have each Promise return an array or hash of the items that it produced.
Merge the results from each one. For example, if each returns an array, then my @all-results = flat await @promises; or similar is enough to gather all of the results together.

You might find your problem fits well into the parallel iterator paradigm, using hyper or race, in which case you don't even need to break up the work or set up the workers yourself; instead, you can pick a degree and batch size.

answered Sep 30 '22 13:09

Jonathan Worthington

Related questions
                            
                                Why is a thread blocking my JavaFX UI Thread?
                            
                                Alternative to sleep inside a thread
                            
                                Multithreaded Functional Programming in Swift
                            
                                Understanding the necessity of wait() and notify() [duplicate]
                            
                                Little confused on the thread behaviour
                            
                                Async/await vs Task.Run in C#
                            
                                How to use volatile correctly in Java
                            
                                How do the channels work in Rust By Example?
                            
                                How to avoid false positives with Helgrind?
                            
                                Is cluster.fork() guaranteed to use a different CPU core?
                            
                                Check what thread is currently doing in python
                            
                                How to run Rails multi-threaded in development?
                            
                                What's the best way to design message key in Kafka?
                            
                                Access the owners counter used by std::recursive_mutex
                            
                                std::memory_order_relaxed atomicity with respect to the same atomic variable
                            
                                Properly terminate flask web app running in a thread
                            
                                Why run one Node.js process per core?
                            
                                C# Multithreading with slots
                            
                                Can multiple threads/processes read/write from/to non-overlapping regions of a file simultaneously without synchronization?
                            
                                Deleting large hashmaps with millions of strings on one thread affects performance on another thread

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are push and pop operation for hashes and arrays atomic and thread-safe?

Tags:

multithreading

push

hash

semaphore

raku

lisprogtor

People also ask

1 Answers

Jonathan Worthington

Recent Activity

Donate For Us