Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How good is the JVM at parallel processing? When should I create my own Threads and Runnables? Why might threads interfere?

I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.

So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?

Thanks very much!

Update: Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:

Simulator sim = new Simulator(args); 
sim.play(); 
return sim.getResults(); 

I write this in my constructor:

ExecutorService executor = Executors.newFixedThreadPool(32);

And then each time I want to add a new simulation to the pool, I run this:

RunnableSimulator rsim = new RunnableSimulator(args); 
exectuor.exectue(rsim); 
return rsim.getResults(); 

The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.

I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.

Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?

like image 253
hornairs Avatar asked Dec 06 '22 06:12

hornairs


2 Answers

Java Concurrency Tutorial

If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.

You should skim the whole tutorial, but for this particular task, start here.

Basically, you do something like this:

ExecutorService executorService = Executors.newFixedThreadPool(n);

where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:

executorService.execute(new SimulationTask(parameters...));

Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.

The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.

EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.

like image 119
Adam Jaskiewicz Avatar answered Dec 07 '22 18:12

Adam Jaskiewicz


You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.

like image 31
dfa Avatar answered Dec 07 '22 19:12

dfa