Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming to interfaces and synchronized collections

This question relates to Java collections - specifically Hashtable and Vector - but may also apply elsewhere.

I've read in many places how good it is to program to interfaces and I agree 100%. The ability to program to a List interface, for instance, without regard for the underlying implementation is most certainly helpful for decoupling and testing purposes. With collections, I can see how an ArrayList and a LinkedList are applicable under different circumestances, given the differences with respect to internal storage structure, random access times, etc. Yet, these two implementations can be used under the same interface...which is great.

What I can't seem to place is how certain synchronized implementations (in particular Hashtable and Vector) fit in with these interfaces. To me, they don't seem to fit the model. Most of the underlying data structure implementations seem to vary in how the data is stored (LinkedList, Array, sorted tree, etc.), whereas synchronization deals with conditions (locking conditions) under which the data may be accessed. Let's look at an example where a method returns a Map collection:

public Map<String, String> getSomeData();

Let's assume that the application is not concerned at all with concurrency. In this case, we operate on whatever implementation the method returns via the interface...Everybody is happy. The world is stable.

However, what if the application now requires attention on the concurrency front? We now cannot operate without regard for the underlying implementation - Hashtable would be fine, but other implementations must be catered for. Let's consider 3 scenarios:

1) Enforce synchronization using synchronization blocks, etc. when adding/removing with the collection. Wouldn't this, however, be overkill in the event that a synchronized implementation (Hashtable) gets returned?

2) Change the method signature to return Hashtable. This, however, tightly binds us to the Hashtable implementation, and as a result, the advantages of programming to an interface are thrown out the window.

3) Make use of the concurrent package and change the method signature to return an implementation of the ConcurrentMap interface. To me, this seems like the way forward.

Essentially, it just seems like certain synchronized implementations are a bit of a misfit within the collections framework in that, when programming to interfaces, the synchronization issue almost forces one to think about the underlying implementation.

Am I completely missing the point here?

Thanks.

like image 300
user192585 Avatar asked Oct 19 '09 21:10

user192585


People also ask

How do I create a synchronized collection?

We can use Collections. synchronizedList(List<T>) method to synchronize collections in java. The synchronizedList(List<T>) method is used to return a synchronized (thread-safe) list backed by the specified list.

What is collection synchronization?

The synchronizedCollection() method of java. util. Collections class is used to return a synchronized (thread-safe) collection backed by the specified collection. In order to guarantee serial access, it is critical that all access to the backing collection is accomplished through the returned collection.

What is the difference between synchronized and concurrent collection?

ConcurrentHashMap allows performing concurrent read and write operation. Hence, performance is relatively better than the Synchronized Map. In Synchronized HashMap, multiple threads can not access the map concurrently. Hence, the performance is relatively less than the ConcurrentHashMap.


1 Answers

1) Yes, it will be overkill
2) Correct, that should not be done
3) Depends on the situation.

The thing is, as you already know, programming to the interface describe what the application does ( not how it does it, that's implementation )

Synchronization was removed from subsequent implementations ( remember, Vector and Hastable are prior to java 1.2 later came ArrayList and HasMap that were not synchronized, but all of them did implement List and Map interface respectively ), because they result in performance penalty due to the excessive synchronization. For instance if you use a vector in a single thread, you still got synchronization within that single thread.

Sharing a datastructure between multiple threads is something that has to be considered when designing the application. There you will pick the methods that you will use and you'll choose who is responsible for keeping the data state clean.

Here's where you choose between option 1 or 3 that you mentioned. Would there be a manual synchronization? Should we use a synchronized interface? What version we will support etc etc.

For instance, if you pick 1, you can also in your design reject certain implementations ( ie vector )

Data synchronization is not something that happens by "luck" you really have to design for it to happen correctly and don't cause more problems that those it solves.

During this design, you should pay attention to the options ( the implementations ) and/or the underlying infrastructure you'll use.

The easiest way to avoid excessive synchronization is to use immutable data and don't share your data with other threads.

Something very similar to the first law of distributing computing by Martin Fowler:

"Hence, we get to my First Law of Distributed Object Design: Don't distribute your objects."

Would the first law of multithreaded applications be:

First law of multithreaded applications: don't share your data?

:)

Final note: the Collections class provides "synchronized" version of some interfaces:

Synchronized List
Synchronized Map
Synchronized Set

like image 146
OscarRyz Avatar answered Nov 13 '22 15:11

OscarRyz