I've written a program that counts lines, words, and characters in a text: it does this with threads. It works great sometimes, but not so great other times. What ends up happening is the variables pointing to the number of words and characters counted sometimes come up short and sometimes don't.
It seems to me that the threads are sometimes ending before they can count all the words or characters that they want to. Is it because these threads go out of scope when the while (true) loop breaks?
I've included the code from the thready part of my problem below:
private void countText() {
try {
reader = new BufferedReader(new FileReader("this.txt"));
while (true) {
final String line = reader.readLine();
if(line == null) {break;}
lines++;
new Thread(new Runnable() {public void run() {chars += characterCounter(line);}}).start();
new Thread(new Runnable() {public void run() {words += wordCounter(line);}}).start();
println(line);
}
} catch(IOException ex) {return;}
}
(Sub Question: This is the first time I've asked about something and posted code. I don't want to use StackOverflow in place of google and wikipedia and am worried that this isn't an appropriate question? I tried to make the question more general so that I'm not just asking for help with my code... but, is there another website where this kind of question might be more appropriate?)
A different threaded design would make it easier to find and fix this kind of problem, and be more efficient into the bargain. This is a longish response, but the summary is "if you're doing threads in Java, check out java.util.concurrent as soon as humanly possible)".
I guess you're multithreading this code to learn threads rather than to speed up counting words, but that's a very inefficient way to use threads. You're creating two threads per line - two thousand threads for a thousand line file. Creating a thread (in modern JVMs) uses operating system resources and is generally fairly expensive. When two - let alone two thousand - threads have to access a shared resource (such as your chars
and words
counters), the resulting memory contention also hurts performance.
Making the counter variables synchronized
as Chris Kimpton suggests or Atomic
as WMR suggests will probably fix the code, but it will also make the effect of contention much worse. I'm pretty sure it will go slower than a single-threaded algorithm.
I suggest having just one long-lived thread which looks after chars
, and one for words
, each with a work queue to which you submit jobs each time you want to add a new number. This way only one thread is writing to each variable, and if you make changes to the design it'll be more obvious who's responsible for what. It'll also be faster because there's no memory contention and you're not creating hundreds of threads in a tight loop.
It's also important, once you've read all the lines in the file, to wait for all the threads to finish before you actually print out the values of the counters, otherwise you lose the updates from threads that haven't finished yet. With your current design you'd have to build up a big list of threads you created, and run through it at the end checking that they're all dead. With a queue-and-worker-thread design you can just tell each thread to drain its queue and then wait until it's done.
Java (from 1.5 and up) makes this kind of design very easy to implement: check out java.util.concurrent.Executors.newSingleThreadExecutor. It also makes it easy to add more concurrency later on (assuming proper locking etc), as you can just switch to a thread pool rather than a single thread.
As Chris Kimpton already pointed out correctly you have a problem with the updating of chars
and words
in different threads. Synchronizing on this
won't work either because this
is a reference to the current thread which means different threads will synchronize on different objects. You could use an extra "lock object" you can synchronize on but the easiest way to fix this would probably be to use AtomicIntegers for the 2 counters:
AtomicInteger chars = new AtomicInteger();
...
new Thread(new Runnable() {public void run() { chars.addAndGet(characterCounter(line));}}).start();
...
While this will probably fix your problem, Sam Stoke's more detailed answer is completely right, the original design is very inefficient.
To answer your question about when a thread "goes out of scope": You are starting two new threads for every line in your file and all of them will run until they reach the end of their run()
method. This is unless you make them daemon threads), in that case they'll exit as soon as daemon threads are the only ones still running in this JVM.
Sounds like a good question to me... I think the problem might be related to the atomicity of the chars += and words += - several threads could be calling that at the same time - do you do anything to ensure that there is no interleaving.
That is:
Thread 1, has chars = 10, wants to add 5
Thread 2, has chars = 10, wants to add 3
Thread 1 works out new total, 15
Thread 2 works out new total, 13
Thread 1 sets chars to 15
Thread 2 sets chars to 13.
Might be possible unless you use synchronized when updating those vars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With