Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use multiple threads in Java to iterate over a Collection where no two threads ever iterate over the same part of the Collection?

I need to iterate over a large ArrayList (~50,000 entries) and I need to use multiple threads to do this fairly quickly.

But I need each thread to start at a unique index so that no two threads ever iterate over the same part of the list. There will be a batchSize of 100 so each thread will loop from its startIndex to startIndex + 100.

Is there any way to achieve this? Note that I am only performing read operations here, no writes. Each entry in the list is just a String which is actually an SQL query which I am then executing against a DB via JDBC.

like image 988
fulhamHead Avatar asked Feb 10 '23 05:02

fulhamHead


2 Answers

If you only intend to read the List, not mutate it, you can simply define your Runnable to take the List and a startIndex as constructor arguments. There's no danger to concurrently reading an ArrayList (even the same indices) as long as no threads modify it at the same time.

To be safe, be sure to wrap your ArrayList in a call to Collections.unmodifiableList() and pass that List to your Runnables. That way you can be confident the threads will not modify the backing ArrayList.

Alternatively, you can construct sublists in your main thread (with List.subList()) so that you don't need to pass the startIndex to each thread. However you still want to make the sublists unmodifiable before you do so. Six of one, half a dozen of the other.

Even better would be to use Guava's ImmutableList; it's naturally thread-safe.

There's also parallel streams in Java 8, but take care with this solution; they're powerful, but easy to get wrong.

like image 58
dimo414 Avatar answered Feb 13 '23 03:02

dimo414


If you use Java 8, look at list.stream().parallel()

For Java 7, use subList() outside of the threads to split the work into pieces. The threads should then just operate on such a sub-list. For most lists, subList() is a very efficient operation which doesn't copy the data. If the backing list is modified, then you get a ConcurrentModificationException

As the pumping the data to the threads, I suggest to look at the Executor API and Queues. Just put all the work pieces in the queue and let the executor figure everything out.

like image 39
Aaron Digulla Avatar answered Feb 13 '23 01:02

Aaron Digulla