Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can i have a faster performance for this loop?

I am reading a book and deleting a number of words from it. My problem is that the process takes long time, and i want to make its performance better(Less time), example :

Vector<String> pages = new Vector<String>();  // Contains about 1500 page, each page has about 1000 words.
Vector<String> wordsToDelete = new Vector<String>();  // Contains about 50000 words.

for( String page: pages ) {
    String pageInLowCase = page.toLowerCase();

    for( String wordToDelete: wordsToDelete ) {
        if( pageInLowCase.contains( wordToDelete ) )
            page = page.replaceAll( "(?i)\\b" + wordToDelete + "\\b" , "" );
    }

    // Do some staff with the final page that does not take much time.
}

This code takes around 3 minutes to execute. If i skipped the loop of replaceAll(...) i can save more than 2 minutes. So is there a way to do the same loop with a faster performance ?

like image 952
Brad Avatar asked Dec 03 '22 03:12

Brad


1 Answers

Yes, you can process page in a different way. The basic idea is following

for (String word : page) {
    if (!forbiddenWords.contains(word)) {
        pageResult.append(word);
    }
}

Here forbiddenWords is a set.
Also, for (String word : page) is a shorthand for parsing page into a list of words. Don't forget to add whitespaces to result too (I skip that for clarity).

The complexity of processing one page in original version was ~ 50000*1000, while now it's only ~1000. (checking if word is in HashSet takes constant time)

edit
Since I wanted to divert myself from work for ten minutes, here's the code :)

    String text = "This is a bad word, and this is very bad, terrible word.";
    Set<String> forbiddenWords = new HashSet<String>(Arrays.asList("bad", "terrible"));

    text += "|"; // mark end of text
    boolean readingWord = false;
    StringBuilder currentWord = new StringBuilder();
    StringBuilder result = new StringBuilder();

    for (int pos = 0; pos < text.length(); ++pos) {
        char c = text.charAt(pos);
        if (readingWord) {
            if (Character.isLetter(c)) {
                currentWord.append(c);
            } else {
                // finished reading a word
                readingWord = false;
                if (!forbiddenWords.contains(currentWord.toString().toLowerCase())) {
                    result.append(currentWord);
                }

                result.append(c);
            }
        } else {
            if (Character.isLetter(c)) {
                // start reading a new word
                readingWord = true;
                currentWord.setLength(0);
                currentWord.append(c);
            } else {
                // append punctuation marks and spaces to result immediately
                result.append(c); 
            }
        }
    }

    result.setLength(result.length() - 1); // remove end of text mark
    System.out.println(result);
like image 152
Nikita Rybak Avatar answered Dec 26 '22 03:12

Nikita Rybak