I have a huge data file (close to 4T) that I need to crunch. I am using 4 threads on my 4-core CPU. First thread analyzes the first quarter of the file, and so on. All the threads need to add their results to the same single hash and single array after they have analyzed sections of their own quarter of the data file. So, is the "push" and "pop" and "shift" and "unshift" operations for hash and array atomic and thread-safe, or I have to resort to more complicated mechanisms like semaphores?
push. Inserts values of the list at the end of an array. pop. Removes the last value of an array.
In standard Ruby implementations, an Array is not thread-safe.
What are push() and pop() methods in JavaScript. push() is used to add an element/item to the end of an array. The pop() function is used to delete the last element/item of the array.
Unfortunately, Ruby doesn't ship with any thread-safe Array or Hash implementations. The core Array and Hash classes are not thread-safe by default, nor should they be.
No, they are neither atomic nor threadsafe, and use from multiple threads will lead to crashes or data inconsistencies.
That said, even if they were, a design that involves lots of contention on the same data structure will scale poorly as you add more threads. This is because of the way hardware works in the face of parallelism; briefly:
You can use locking to attain correctness. For this, I don't recommend working with a lock directly, but instead look in to a module like OO::Monitors
, where you can encapsulate the hash in an object and have locking done at the boundaries.
If the number of pushes you do on the shared data structure is low compared to the amount of work done to produce the items to push, then you might not bottleneck on the locking and contention around the data structure. If you are doing thousands of push
es or similar per second, however, I suggest looking for an alternative design. For example:
start
to set off each worker, which returns a Promise
. Put the Promise
s into an array.Promise
return an array or hash of the items that it produced.my @all-results = flat await @promises;
or similar is enough to gather all of the results together.You might find your problem fits well into the parallel iterator paradigm, using hyper or race, in which case you don't even need to break up the work or set up the workers yourself; instead, you can pick a degree and batch size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With