My program reads a text file line by line in a while loop. It then processes each line and extracts some information to be written in the output. Everything it does inside the while loop is O(1) except two ArrayList indexOf() method calls which I suppose are O(N). The program runs at a reasonable pace (1M lines per 100 seconds) in the beginning but over time it slows down dramatically. I have 70 M lines in the input file so the loop iterates 70 million times. In theory this should take about 2 hours but in practice it takes 13 hours. Where is the problem?
Here is the code snippet:
BufferedReader corpus = new BufferedReader(
new InputStreamReader(
new FileInputStream("MyCorpus.txt"),"UTF8"));
Writer outputFile = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("output.txt"), "UTF-8"));
List<String> words = new ArrayList();
//words is being updated with relevant values here
LinkedHashMap<String,Integer> DIC = new LinkedHashMap();
//DIC is being updated with relevant key-value pairs here
String line = "";
while ((line = corpus.readLine()) != null)
String[] parts = line.split(" ");
if (DIC.containsKey(parts[0]) && DIC.containsKey(parts[1])) {
int firstIndexPlusOne = words.indexOf(parts[0])+ 1;
int secondIndexPlusOne = words.indexOf(parts[1]) +1;
outputFile.write(firstIndexPlusOne +" "+secondIndexPlusOne+" "+parts[2]+"\n");
} else {
notFound++;
outputFile.write("NULL\n");
}
}
outputFile.close();
The main reason that While is much slower is because the while loop checks the condition after each iteration, so if you are going to write this code, just use a for loop instead.
The “while” loop A single execution of the loop body is called an iteration. The loop in the example above makes three iterations. If i++ was missing from the example above, the loop would repeat (in theory) forever.
The Do/While Loop This loop will execute the code block once, before checking if the condition is true, then it will repeat the loop as long as the condition is true.
Basically, the infinite loop happens when the condition in the while loop always evaluates to true. This can happen when the variables within the loop aren't updated correctly, or aren't updated at all.
I am assuming you add words to your words
ArrayList
as you go.
You correctly state that words.indexOf
is O(N)
and that is the cause of your issue. As N
increases (you add words to the list) these operations take longer and longer.
To avoid this keep your list sorted and use binarySearch.
To keep it sorted use binarySearch
on each word to work out where to insert it. This takes your complexity from O(n)
to O(log(N))
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With