Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Volatile array of arrays

I have a class with

private volatile long[][] data = new long[SIZE][];

which initially contains just nulls and a method which accesses it. When it hits an null element, it creates a long[] and stores for future use. This operation is idempotent and multiple threads wasting time on the same element is not a problem.

No thread must ever see an incompletely filled element. I hope that the following code does it right:

long[] getOrMakeNewElement(int index) {
    long[] element = data[index]; // volatile read
    if (element==null) {
        element = makeNewElement(index); // local operation
        data = data; // ugliness 1
        data[index] = element;
        data = data; // ugliness 2
    }
    return element;
}

The first ugliness is for ensuring that other threads can in theory see the changes made to element. They can't actually access it as it isn't stored anyway yet. However, in the next like element gets stored and another thread may or may not see this store, so AFAIK the first ugliness is necessary.

The second ugliness then just ensures that other threads see the new data containing element. The strange thing here is using the ugliness twice.

Is this is necessary and sufficient for safe publication?

Note: While this question is similar to this one, it's no duplicate as it deals with modifying an existing 1D array rather than creating one. This makes the answer clear.

Update

Note: This is no production code and I know and don't care about the alternatives (AtomicReferenceArray, synchronized, whatever, ...). I wrote this question in order to learn more about the JMM. It's a real code, but used just for my fooling around with project Euler and no animals were harmed in the process.

I guess, a safe and practical solution would be

class Element {
    Element(int index) {
        value = makeNewElementValue(index);
    }
    final long[] value;
}

private volatile Element[] data = new Element[SIZE];

where the Element ensures visibility via the Semantics of final Fields.

As pointed by user2357112, there's also a (IMHO harmless) data race when multiple thread write the same data[index], which is actually easy to avoid. While the reading must be fast, making a new element is slow enough to allow for any synchronization needed. It'll also allow a more efficient initialization of the data.

like image 478
maaartinus Avatar asked May 24 '15 06:05

maaartinus


Video Answer


2 Answers

The Java Language Specification defines the semantics of volatile as follows:

A write to a volatile variable v synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).

The spec guarantees that whenever an action happens-before another, it is visible to that other action.

For an object to be safely published, the object's initialization must be visible to any thread that obtains a reference to this object. Since the field we are interested in is not final, this can only be accomplished if the initialization of the object is guaranteed to happen-before any other thread obtains a reference to this object.

To verify whether this is the case, let's look at the happens-before graph of the actions involved:

  makeNewElement        
        |               
        v               
   read of data         
        |               
        v               ?
  write to data --------------> read of data
        |                             |
        v                             v
write to array element       read of array element
        |                             |
        v                             V
  read of data                   useElement
        |
        v
  write to data

Clearly, there is a path from makeNewElement to useElement if and only if "write of data" synchronizes-with the "read of data", which it does if and only if the read is subsequent. However, it need not be subsequent: For every execution where the read is subsequent, we can create another execution where it is not, simply by moving the read backwards in synchronization order:

  makeNewElement                   
        |                                
        v                                
   read of data                    read of data
        |                                |
        v                                |
  write to data                          |
        |                                |
        v                                v
write to array element          read of array element            
        |                                |
        v                                v
  read of data                      useElement
        |                                
        v                                
  write to data                     

Ordinarily, we could not do this, as this would change the value being read, but as the writes do not change the value of data, we can not tell from the value read whether the read was before or after the write.

In such an execution, the reference to our new object is published through a data race, because the write of the array element does not happen-before the read. In such cases, the spec writes:

More specifically, if two actions share a happens-before relationship, they do not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship. Writes in one thread that are in a data race with reads in another thread may, for example, appear to occur out of order to those reads.

That is, the reading thread may see the reference to the new object, but not the effect of its initialization, which would be rather bad.

So, how could we ensure that the read is subsequent? If the value read from the volatile field proves that the necessary write has taken place. In your case, we need to distinguish writes to different elements. We can do this with a separate volatile variable for each array element:

Element[] data = ...;

class Element {
    volatile long[] value;
}

long[] getOrMakeNewElement(int index) {
    long[] element = data[index].value; // volatile read
    if (element==null) {
        element = makeNewElement(index); // local operation
        data[index].value = element;
    }
    return element;
}

or change the value of your single volatile field for each write:

volatile long[][] data;

long[] getOrMakeNewElement(int index) {
    long[] element = data[index]; // volatile read
    if (element==null) {
        long[][] newData = Arrays.copyOf(data);
        newData[index] = element = makeNewElement(index);
        data = newData;
    }
    return element;
}    

Of course, this latter solution has the disadvantage that concurrent writes can conflict even for different array elements.

like image 145
meriton Avatar answered Oct 21 '22 23:10

meriton


This is probably safe in practice, according to roach-motel model, or JSR-133 Cookbook.

Let's expand all volatile reads/writes in the following code

    element[0] = something;
    data = data; // ugliness 1
    data[index] = element;

it becomes

    element[0] = something;             [0]
    tmp1 = data;  // volatile read      [1]
    data = tmp1;  // volatile write     [2]
    tmp2 = data;  // volatile read      [3]
    tmp2[index] = element;              [4]

The critical barrier here is [2]+[3]. According to roach motel model:

  nothing before a volatile write can be reordered after  it.
  nothing after  a volatile read  can be reordered before it.

Therefore [2]+[3] combo prevents instructions from being reordered across it. Therefore [0] and [4] cannot be reordered.

Note that [1]+[2] combo is not enough; without [3], the code can be reordered as [1] [4][0] [2].


That's cool. But is it safe in JMM which is weaker than roach-motel? I think it's not. This code

    long[] element = data[index];
    x = element[0];

is expanded to

    tmp0 = data; // volatile read   [a]
                                    [b]  
    element = tmp0[index];          [c]
    x = element[0]                  [d]

Now imagine the earliest thread does [a], then gets paused in [b] for a few seconds, then does [c]. Now [a] is before any other volatile writes, therefore it does not establish any ordering per JMM, therefore there's no telling what [c] and [d] could read. [c] is not a serious problem, but [d] is - [d] could read the default value 0 instead of something written by [0]; or even gibberish because it's long.

As far as I know, if you want to publish an object through a volatile field, either it must be a new object (immutable or not), or it must use volatile members.

JMM is strong enough to guarantee that common concurrency patterns work; but if you want to do something unconventional, it is often too weak; and the worse thing is that JMM is too complicated as a reasoning framework. So if you want your code strictly JMM compliant, the best practice is to follow boring ordinary patterns:)


A better solution that favors reads would be copy-on-write.

like image 28
ZhongYu Avatar answered Oct 21 '22 23:10

ZhongYu