Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to share a HashMap between threads without locking the entire HashMap?

Tags:

I would like to have a shared struct between threads. The struct has many fields that are never modified and a HashMap, which is. I don't want to lock the whole HashMap for a single update/remove, so my HashMap looks something like HashMap<u8, Mutex<u8>>. This works, but it makes no sense since the thread will lock the whole map anyways.

Here's this working version, without threads; I don't think that's necessary for the example.

use std::collections::HashMap; use std::sync::{Arc, Mutex};  fn main() {     let s = Arc::new(Mutex::new(S::new()));     let z = s.clone();     let _ = z.lock().unwrap(); }  struct S {     x: HashMap<u8, Mutex<u8>>, // other non-mutable fields }  impl S {     pub fn new() -> S {         S {             x: HashMap::default(),         }     } } 

Playground

Is this possible in any way? Is there something obvious I missed in the documentation?

I've been trying to get this working, but I'm not sure how. Basically every example I see there's always a Mutex (or RwLock, or something like that) guarding the inner value.

like image 468
Andrew Avatar asked May 10 '18 22:05

Andrew


People also ask

Can multiple threads read the same HashMap?

By default ConcurrentHashMap has segment array size as 16 so simultaneously 16 Threads can put data in map considering each thread is working on separate Segment array index.

How can a HashMap be used in a multithreaded environment?

You must make sure: All of the updates to the HashMap are completed before the threads are instantiated and the thread that creates the map also forks the threads. The threads are only using the HashMap in read-only mode – either get() or iteration without remove. There are no threads updating the map.

How do I make a HashMap thread-safe?

You can make HashMap thread safe by wrapping it with Collections. synchronizedMap() . What's the difference? @naXa ConcurrentHashMap allows concurrent access and a synchronized one doesn't.

Are HashMap thread-safe?

Java HashMap is not thread-safe and hence it should not be used in multithreaded applications.


1 Answers

I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?

In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.

Here's an example:

use std::{     collections::HashMap,     sync::{Arc, Mutex, RwLock},     thread,     time::Duration, };  fn main() {     let data = Arc::new(RwLock::new(HashMap::new()));      let threads: Vec<_> = (0..10)         .map(|i| {             let data = Arc::clone(&data);             thread::spawn(move || worker_thread(i, data))         })         .collect();      for t in threads {         t.join().expect("Thread panicked");     }      println!("{:?}", data); }  fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {     loop {         // Assume that the element already exists         let map = data.read().expect("RwLock poisoned");          if let Some(element) = map.get(&id) {             let mut element = element.lock().expect("Mutex poisoned");              // Perform our normal work updating a specific element.             // The entire HashMap only has a read lock, which             // means that other threads can access it.             *element += 1;             thread::sleep(Duration::from_secs(1));              return;         }          // If we got this far, the element doesn't exist          // Get rid of our read lock and switch to a write lock         // You want to minimize the time we hold the writer lock         drop(map);         let mut map = data.write().expect("RwLock poisoned");          // We use HashMap::entry to handle the case where another thread          // inserted the same key while where were unlocked.         thread::sleep(Duration::from_millis(50));         map.entry(id).or_insert_with(|| Mutex::new(0));         // Let the loop start us over to try again     } } 

This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.

This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.

In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.

The next step up would be to use a crate like arc-swap, which says:

Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.

The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.

I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.

As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.

like image 171
Shepmaster Avatar answered Dec 20 '22 17:12

Shepmaster