So I am thinking about building a hobby project, one off kind of thing, just to brush up on my programming/design.
It's basically a multi threaded web spider, updating the same data structure object->int.
So it is definitely overkill to use a database for this, and the only thing I could think of is a thread-safe singleton used to contain my data structure. http://web.archive.org/web/20121106190537/http://www.ibm.com/developerworks/java/library/j-dcl/index.html
Is there a different approach I should look in to?
Double-checked locking has been proven to be incorrect and flawed (as least in Java). Do a search or look at Wikipedia's entry for the exact reason.
First and foremost is program correctness. If your code is not thread-safe (in a multi-threaded environment) then it's broken. Correctness comes first before performance optimization.
To be correct you'll have to synchronize the whole getInstance
method
public static synchronized Singleton getInstance() {
if (instance==null) ...
}
or statically initialize it
private static final Singleton INSTANCE = new Singleton();
Using lazy initialization for the database in a web crawler is probably not worthwhile. Lazy initialization adds complexity and an ongoing speed hit. One case where it is justified is when there is a good chance the data will never be needed. Also, in an interactive application, it can be used to reduce startup time and give the illusion of speed.
For a non-interactive application like a web-crawler, which will surely need its database to exist right away, lazy initialization is a poor fit.
On the other hand, a web-crawler is easily parallelizable, and will benefit greatly from being multi-threaded. Using it as an exercise to master the java.util.concurrent
library would be extremely worthwhile. Specifically, look at ConcurrentHashMap
and ConcurrentSkipListMap
, which will allow multiple threads to read and update a shared map.
When you get rid of lazy initialization, the simplest Singleton pattern is something like this:
class Singleton {
static final Singleton INSTANCE = new Singleton();
private Singleton() { }
...
}
The keyword final
is the key here. Even if you provide a static
"getter" for the singleton rather than allowing direct field access, making the singleton final
helps to ensure correctness and allows more aggressive optimization by the JIT compiler.
If your life depended on a few microseconds then I would advise you to optimize your resource locking to where it actually mattered.
But in this case the keyword here is hobby project!
Which means that if you synchronized the entire getInstance() method you will be fine in 99.9% of all cases. I would NOT recommend doing it any other way.
Later, if you prove by means of profiling that the getInstance() synchronization is the bottleneck of your project, then you can move on and optimize the concurrency. But I really doubt it will cause you trouble.
Jeach!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With