Increasing performance on text processing

Question

I have written a program that indicates all instances of a desired wordclass in a text. This is how I do it:

Make an array of words from the entire text
Iterate this array. For each word, look what its first letter is.
- Jump to the corresponding array in an object of all words of the selected wordclass (e.g 'S') and iterate it. Break if the word is found and push it into an array of matches.
After all words are checked, iterate the array of matches and highlight each one in the text.

A text which consists of 240000 words is processed in 100 seconds regarding nouns and about 4.5 seconds regarding prepositions on my machine.

I am looking for a way to improve performance and those are the ideas I could come up with:

Rearrange the items in each block of my wordlist. Sort them in a way that if the word starts with a vocal, all items that have a consonant as its second character come first and vice versa. (in the assuming that words with double vocals or consonants are rare)
Structure the text into chapters and process only the currently shown chapter.

Are those solid ideas and are there any more ideas or proven techniques to improve this kind of processing?

DrC · Accepted Answer

Use the power of javascript.

It manipulates dictionaries with string keys as a fundamental operation. For each word class, build an object with each possible word being a key and some simple value like true or 1. Then checking each word is simply typeof(wordClass[word]) !== "undefined". I expect this to be much much faster.

Regular expressions are another highly optimized area of Javascript. You can probably do the whole thing as one massive regular expression for each word class. If your highlighting is in HTML, then you can also just use a replace on the RE to get the result. This working is likely dependent on just how big your word sets are.

Increasing performance on text processing

Tags:

performance

javascript

text

process

Wottensprels

1 Answers

DrC

Recent Activity

Donate For Us

Increasing performance on text processing

Tags:

performance

javascript

text

process

Wottensprels

1 Answers

DrC

Related questions

Recent Activity

Donate For Us