If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
First of all, reference assignment is atomic because the specification says so. Besides that, there is no obstacle for JVM implementors to fulfill this constraint, as 64 Bit references are usually only used on 64 Bit architectures, where atomic 64 Bit assignment comes for free.
AtomicReference is flexible way to update the variable value atomically without use of synchronization. AtomicReference support lock-free thread-safe programming on single variables. There are multiple ways of achieving Thread safety with high level concurrent API. Atomic variables is one of the multiple options.
Atomic reference counting is expensive mainly under contention, when the same object is used from multiple threads. Uncontended atomic operations are ~2-3x slower than their nonatomic analogs, which is still very fast.
The AtomicReference class provides an object reference variable which can be read and written atomically. By atomic is meant that multiple threads attempting to change the same AtomicReference (e.g. with a compare-and-swap operation) will not make the AtomicReference end up in an inconsistent state.
Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE
would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean>
my guess is that it does the same thing with setting all the values to Boolean.TRUE
behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile
modifier to global
's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile
in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global
a volatile
reference then it's possible that no thread will ever see any changes to the value of global
until they try to update it, at which point entering the synchronized
block will flush the local cache and get the updated reference value.
However, even with the addition of volatile
there's still a huge problem. Here's a problem scenario with two threads:
global={}
. Threads A
and B
both have this value in their thread-local cached memory.A
obtains obtains the synchronized
lock on global
and starts the update by making a copy of global
and adding the new key to the set.A
is still inside the synchronized
block, Thread B
reads its local value of global
onto the stack and tries to enter the synchronized
block. Since Thread A
is currently inside the monitor Thread B
blocks.A
completes the update by setting the reference and exiting the monitor, resulting in global={1}
.B
is now able to enter the monitor and makes a copy of the global={1}
set.A
decides to make another update, reads in its local global
reference and tries to enter the synchronized
block. Since Thread B currently holds the lock on {}
there is no lock on {1}
and Thread A
successfully enters the monitor!
A
also makes a copy of {1}
for purposes of updating.Now Threads A
and B
are both inside the synchronized
block and they have identical copies of the global={1}
set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized
block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final
you don't need to worry about threads using stale references, and since the ConcurrentHashMap
handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!
According to the relevant Java Tutorial,
We have already seen that an increment expression, such as
c++
, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
- Reads and writes are atomic for reference variables and for most primitive variables (all types except
long
anddouble
).- Reads and writes are atomic for all variables declared
volatile
(includinglong
anddouble
variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global
after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized
on all access to global
, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global
to remain immutable... luckily, there is Collections.unmodifiableSet
which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference
,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet
for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global)
creates an Iterator
as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global
in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent
? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList
, or its cousin CopyOnWriteArraySet
(which implements a Set
using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap
? They guarantee that using a for
loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator
is used for enhanced for
over an Iterable
.
You can craft a Set
from this by utilizing Collections.newSetFromMap
like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}
I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size()
method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With