Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded access and variable cache of threads

I could find the answer if I read a complete chapter/book about multithreading, but I'd like a quicker answer. (I know this stackoverflow question is similar, but not sufficiently.)

Assume there is this class:

public class TestClass {
   private int someValue;

   public int getSomeValue() { return someValue; }
   public void setSomeValue(int value) {  someValue = value; }
}

There are two threads (A and B) that access the instance of this class. Consider the following sequence:

  1. A: getSomeValue()
  2. B: setSomeValue()
  3. A: getSomeValue()

If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached value). Is this correct?

Second scenario:

  1. B: setSomeValue()
  2. A: getSomeValue()

In this case, A will always get the correct value, because this is its first access so he can't have a cached value yet. Is this right?

If a class is accessed only in the second way, there is no need for volatile/synchronization, or is it?

Note that this example was simplified, and actually I'm wondering about particular member variables and methods in a complex class, and not about whole classes (i.e. which variables should be volatile or have synced access). The main point is: if more threads access certain data, is synchronized access needed by all means, or does it depend on the way (e.g. order) they access it?


After reading the comments, I try to present the source of my confusion with another example:

  1. From UI thread: threadA.start()
  2. threadA calls getSomeValue(), and informs the UI thread
  3. UI thread gets the message (in its message queue), so it calls: threadB.start()
  4. threadB calls setSomeValue(), and informs the UI thread
  5. UI thread gets the message, and informs threadA (in some way, e.g. message queue)
  6. threadA calls getSomeValue()

This is a totally synchronized structure, but why does this imply that threadA will get the most up-to-date value in step 6? (if someValue is not volatile, or not put into a monitor when accessed from anywhere)

like image 859
Thomas Calc Avatar asked Jun 29 '12 22:06

Thomas Calc


People also ask

What are the two types of multithreading?

Multithreading is running multiple tasks within a process. It is of two types, namely user level threads and kernel level threads. It is economical, responsive, scalable, efficient, and allows resource sharing. There are three models in multithreading: Many to many model, Many to one model, and one to one model.

What is the difference between threading and multithreading?

Single threaded processes contain the execution of instructions in a single sequence. In other words, one command is processes at a time. The opposite of single threaded processes are multithreaded processes. These processes allow the execution of multiple parts of a program at the same time.

Can two threads access same variable?

Only one thread can read and write a shared variable at a time. When one thread is accessing a shared variable, other threads should wait until the first thread is done. This guarantees that the access to a shared variable is Atomic, and multiple threads do not interfere.

What is thread cache?

The Malloc Thread Cache maintains a per-thread pool of unallocated memory for the purpose of reducing contention for the global heap structures. This cache attempts to preallocate memory pieces for future use according to the pattern of allocations already performed by the thread.


4 Answers

If two threads are calling the same methods, you can't make any guarantees about the order that said methods are called. Consequently, your original premise, which depends on calling order, is invalid.

It's not about the order in which the methods are called; it's about synchronization. It's about using some mechanism to make one thread wait while the other fully completes its write operation. Once you've made the decision to have more than one thread, you must provide that synchronization mechanism to avoid data corruption.

like image 182
Robert Harvey Avatar answered Oct 13 '22 06:10

Robert Harvey


As we all know, that its the crucial state of the data that we need to protect, and the atomic statements which govern the crucial state of the data must be Synchronized.

I had this example, where is used volatile, and then i used 2 threads which used to increment the value of a counter by 1 each time till 10000. So it must be a total of 20000. but to my surprise it didnt happened always.

Then i used synchronized keyword to make it work.

Synchronization makes sure that when a thread is accessing the synchronized method, no other thread is allowed to access this or any other synchronized method of that object, making sure that data corruption is not done.

Thread-Safe class means that it will maintain its correctness in the presence of the scheduling and interleaving of the underlining Runtime environment, without any thread-safe mechanism from the Client side, which access that class.

like image 20
Kumar Vivek Mitra Avatar answered Oct 13 '22 07:10

Kumar Vivek Mitra


Let's look at the book.

A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable.

So volatile is a guarantee that the declared variable won't be copied into thread local storage, which is otherwise allowed. It's further explained that this is an intentional alternative to locking for very simple kinds of synchronized access to shared storage.

Also see this earlier article, which explains that int access is necessarily atomic (but not double or long).

These together mean that if your int field is declared volatile then no locks are necessary to guarantee atomicity: you will always see a value that was last written to the memory location, not some confused value resulting from a half-complete write (as is possible with double or long).

However you seem to imply that your getters and setters themselves are atomic. This is not guaranteed. The JVM can interrupt execution at intermediate points of during the call or return sequence. In this example, this has no consequences. But if the calls had side effects, e.g. setSomeValue(++val), then you would have a different story.

like image 42
Gene Avatar answered Oct 13 '22 06:10

Gene


The issue is that java is simply a specification. There are many JVM implementations and examples of physical operating environments. On any given combination an an action may be safe or unsafe. For instance On single processor systems the volatile keyword in your example is probably completely unnecessary. Since the writers of the memory and language specifications can't reasonably account for possible sets of operating conditions, they choose to white-list certain patterns that are guaranteed to work on all compliant implementations. Adhering to to these guidelines ensures both that your code will work on your target system and that it will be reasonably portable.

In this case "caching" typically refers to activity that is going on at the hardware level. There are certain events that occur in java that cause cores on a multi processor systems to "Synchronize" their caches. Accesses to volatile variables are an example of this, synchronized blocks are another. Imagine a scenario where these two threads X and Y are scheduled to run on different processors.

X starts and is scheduled on proc 1
y starts and is scheduled on proc 2

.. now you have two threads executing simultaneously
to speed things up the processors check local caches
before going to main memory because its expensive.

x calls setSomeValue('x-value') //assuming proc 1's cache is empty the cache is set
                                //this value is dropped on the bus to be flushed
                                //to main memory
                                //now all get's will retrieve from cache instead
                                //of engaging the memory bus to go to main memory 
y calls setSomeValue('y-value') //same thing happens for proc 2

//Now in this situation depending on to order in which things are scheduled and
//what thread you are calling from calls to getSomeValue() may return 'x-value' or
//'y-value. The results are completely unpredictable.  

The point is that volatile(on compliant implementations) ensures that ordered writes will always be flushed to main memory and that other processor's caches will be flagged as 'dirty' before the next access regardless of the thread from which that access occurs.

disclaimer: volatile DOES NOT LOCK. This is important especially in the following case:

volatile int counter;

public incrementSomeValue(){
    counter++; // Bad thread juju - this is at least three instructions 
               // read - increment - write             
               // there is no guarantee that this operation is atomic
}

this could be relevant to your question if your intent is that setSomeValue must always be called before getSomeValue

If the intent is that getSomeValue() must always reflect the most recent call to setSomeValue() then this is a good place for the use of the volatile keyword. Just remember that without it there is no guarantee that getSomeValue() will reflect to most recent call to setSomeValue() even if setSomeValue() was scheduled first.

like image 22
nsfyn55 Avatar answered Oct 13 '22 07:10

nsfyn55