Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are push and pop operation for hashes and arrays atomic and thread-safe?

I have a huge data file (close to 4T) that I need to crunch. I am using 4 threads on my 4-core CPU. First thread analyzes the first quarter of the file, and so on. All the threads need to add their results to the same single hash and single array after they have analyzed sections of their own quarter of the data file. So, is the "push" and "pop" and "shift" and "unshift" operations for hash and array atomic and thread-safe, or I have to resort to more complicated mechanisms like semaphores?

like image 645
lisprogtor Avatar asked Apr 21 '20 22:04

lisprogtor


People also ask

What is push and pop in Array?

push. Inserts values of the list at the end of an array. pop. Removes the last value of an array.

Are Ruby arrays thread-safe?

In standard Ruby implementations, an Array is not thread-safe.

What does push and pop do in JavaScript?

What are push() and pop() methods in JavaScript. push() is used to add an element/item to the end of an array. The pop() function is used to delete the last element/item of the array.

Is Ruby hash thread-safe?

Unfortunately, Ruby doesn't ship with any thread-safe Array or Hash implementations. The core Array and Hash classes are not thread-safe by default, nor should they be.


1 Answers

No, they are neither atomic nor threadsafe, and use from multiple threads will lead to crashes or data inconsistencies.

That said, even if they were, a design that involves lots of contention on the same data structure will scale poorly as you add more threads. This is because of the way hardware works in the face of parallelism; briefly:

  • Memory performance is heavily dependent on caches
  • Some cache levels are per CPU core
  • Writing to memory means getting it exclusively into the current core's cache
  • The process of moving it from one core's cache in order to write to it is costly (ballpack 60-100 cycle penalty)

You can use locking to attain correctness. For this, I don't recommend working with a lock directly, but instead look in to a module like OO::Monitors, where you can encapsulate the hash in an object and have locking done at the boundaries.

If the number of pushes you do on the shared data structure is low compared to the amount of work done to produce the items to push, then you might not bottleneck on the locking and contention around the data structure. If you are doing thousands of pushes or similar per second, however, I suggest looking for an alternative design. For example:

  1. Break the work up into a part for each worker
  2. Use start to set off each worker, which returns a Promise. Put the Promises into an array.
  3. Have each Promise return an array or hash of the items that it produced.
  4. Merge the results from each one. For example, if each returns an array, then my @all-results = flat await @promises; or similar is enough to gather all of the results together.

You might find your problem fits well into the parallel iterator paradigm, using hyper or race, in which case you don't even need to break up the work or set up the workers yourself; instead, you can pick a degree and batch size.

like image 69
Jonathan Worthington Avatar answered Sep 30 '22 13:09

Jonathan Worthington