efficient method to replace multiple words in text

Question

Using JavaScript I need to efficiently remove ~10000 keywords from a ~100000 word document, of which ~1000 will be keywords. What approach would you suggest?

Would a massive regular expression be practical? Or should I just iterate through the document characters looking for keywords (boring)?

Edit:
Good point - only whole words, not parts. And some keywords contain spaces.
I am trying to do it all client side to reduce pressure on the backend.

Emil Ivanov · Accepted Answer

Using a regular expression might be a good option:

var words = ['bon', 'mad'];
'joe bon joe mad'.replace(new RegExp('(' + words.join('|') + ')', 'g'), '');
// 'joe  joe  '

The regex¹ isn't very complicated with things like look-ahead, and the regexp engine is written in C/C++, so you can expect it be quite fast. Nevertheless - benchmark and see if the performance fits your needs.

I don't think that implementing your own parser will be faster, but I might be wrong - benchmark.

Sending the document to the server doesn't sound very good to me. With 100k words you are looking at a payload in the megabytes range, and you still have to do something with it on the server and push it back.

¹ You might have to tune the regexp to do something with the spaces.

efficient method to replace multiple words in text

Tags:

performance

javascript

regex

text

hoju

1 Answers

Emil Ivanov

Recent Activity

Donate For Us

efficient method to replace multiple words in text

Tags:

performance

javascript

regex

text

hoju

1 Answers

Emil Ivanov

Related questions

Recent Activity

Donate For Us