Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it thread-safe to synchronize only on add to HashSet?

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.

Just like in code below:

void main() {
  final Set<String> set = new HashSet<>();
  final ExecutorService threadExecutor = 
  Executors.newFixedThreadPool(10);

  threadExecutor.submit(() -> doJob(set));
} 

void doJob(final Set<String> pSet) {
  // do some stuff
  final String x = ... // doesn't matter how we received the value.
  if (!pSet.contains(x)) {
    synchronized (pSet) {
      // double check to prevent multiple adds within different threads
      if (!pSet.contains(x)) {
        // do some exclusive work with x.
        pSet.add(x);
      }
    }
  }
  // do some stuff
}

I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?

My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.

P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading Even though answers to both could be similar, this question addresses more particular case.

like image 494
Akirus Avatar asked Aug 05 '19 14:08

Akirus


People also ask

Does HashSet is synchronized or not?

HashMap allows null values and null keys. Both HashSet and HashMap are not synchronized.

Is HashSet thread-safe?

Thread Safe HashSet Using ConcurrentHashMap Factory Method Basically, this method returns an instance that respects the java. util. Set interface and allows the usage of standard methods like add(), contains(), etc.

Is synchronized thread-safe?

synchronized keyword is one of the way to achieve 'thread safe'. But Remember:Actually while multiple threads tries to access synchronized method they follow the order so becomes safe to access.

When should you synchronize threads?

Thread synchronization is the concurrent execution of two or more threads that share critical resources. Threads should be synchronized to avoid critical resource use conflicts. Otherwise, conflicts may arise when parallel-running threads attempt to modify a common variable at the same time.


3 Answers

I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?

Short answers: No and Yes.

There are two ways of explaining this:

The intuitive explanation

Java synchronization (in its various forms) guards against a number of things, including:

  • Two threads updating shared state at the same time.
  • One thread trying to read state while another is updating it.
  • Threads seeing stale values because memory caches have not been written to main memory.

In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.

However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.

The Memory Model explanation

The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in

 // thread one
 synchronized (sharedLock) {
    sharedVariable = 42;
 }

 // thread two
 synchronized (sharedLock) {
     other = sharedVariable;
 }

Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.

Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.

like image 51
Stephen C Avatar answered Oct 09 '22 03:10

Stephen C


That code is thread safe for the the synchronized (pSet) { } part :

if (!pSet.contains(x)) {
  synchronized (pSet) { 
  // Here you are sure to have the updated value of pSet    
  if (!pSet.contains(x)) {
    // do some exclusive work with x.
    pSet.add(x);
  }
}

because inside the synchronized statement on the pSet object :

  • one and only one thread may be in this block.
  • and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.

So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.

But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :

Set<String> pSet = ConcurrentHashMap.newKeySet();

That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).

like image 24
davidxxx Avatar answered Oct 09 '22 03:10

davidxxx


No,

You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.

Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.

like image 1
aschoerk Avatar answered Oct 09 '22 03:10

aschoerk