I am reading a book and deleting a number of words from it. My problem is that the process takes long time, and i want to make its performance better(Less time), example :
Vector<String> pages = new Vector<String>(); // Contains about 1500 page, each page has about 1000 words.
Vector<String> wordsToDelete = new Vector<String>(); // Contains about 50000 words.
for( String page: pages ) {
String pageInLowCase = page.toLowerCase();
for( String wordToDelete: wordsToDelete ) {
if( pageInLowCase.contains( wordToDelete ) )
page = page.replaceAll( "(?i)\\b" + wordToDelete + "\\b" , "" );
}
// Do some staff with the final page that does not take much time.
}
This code takes around 3 minutes to execute. If i skipped the loop of replaceAll(...) i can save more than 2 minutes. So is there a way to do the same loop with a faster performance ?
Yes, you can process page in a different way. The basic idea is following
for (String word : page) {
if (!forbiddenWords.contains(word)) {
pageResult.append(word);
}
}
Here forbiddenWords
is a set.
Also, for (String word : page)
is a shorthand for parsing page into a list of words. Don't forget to add whitespaces to result too (I skip that for clarity).
The complexity of processing one page in original version was ~ 50000*1000, while now it's only ~1000. (checking if word is in HashSet
takes constant time)
edit
Since I wanted to divert myself from work for ten minutes, here's the code :)
String text = "This is a bad word, and this is very bad, terrible word.";
Set<String> forbiddenWords = new HashSet<String>(Arrays.asList("bad", "terrible"));
text += "|"; // mark end of text
boolean readingWord = false;
StringBuilder currentWord = new StringBuilder();
StringBuilder result = new StringBuilder();
for (int pos = 0; pos < text.length(); ++pos) {
char c = text.charAt(pos);
if (readingWord) {
if (Character.isLetter(c)) {
currentWord.append(c);
} else {
// finished reading a word
readingWord = false;
if (!forbiddenWords.contains(currentWord.toString().toLowerCase())) {
result.append(currentWord);
}
result.append(c);
}
} else {
if (Character.isLetter(c)) {
// start reading a new word
readingWord = true;
currentWord.setLength(0);
currentWord.append(c);
} else {
// append punctuation marks and spaces to result immediately
result.append(c);
}
}
}
result.setLength(result.length() - 1); // remove end of text mark
System.out.println(result);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With