Algorithm used by browser for searching words in the webpage

Question

Which data structure or algorithm is used in browsers to search for a word? Will the browsers construct a trie or suffix tree?

Thank you
Bala

Jerry Coffin · Accepted Answer

Searching with a trie/suffix tree is fast -- but building the trie to start with is considerably slower. That means they only make sense when you expect to carry out a large number of searches over the same data, so you can amortize the time to build the trie over many searches.

The average number of searches inside a web page is probably fractional (i.e. you expect the user to load several pages before doing a search even once). Even when you do search a page, doing a lot of searches in the same page is probably pretty rare.

That means a linear search will almost always be substantially more efficient overall than a trie or suffix tree. My guess is that if they bother optimizing it beyond a simple call to strstr() that they only go as far as something in the Boyer-Moore family of string searching. Given the number of searches you expect in a web page, this will normally finish them all before you could just do the initial build of the trie, so you could start searching with it.

For interactive use, your primary concern is producing results fast enough to appear instantaneous. That generally means results within 100ms or so. With a good implementation of Boyer-Moore-Horspool, that's enough time to search an amount of text that would be insane to include in a single web page (on the order of hundreds of megabytes or gigabytes).

If you want to put it to the test, I'd recommend Ray Gardner's implementation of Boyer-Moore-Horspool (Bmhsrch.C, from Bob Stout's Snippets site). I'd truly hate to see a web page large enough to keep it occupied for even 20 ms, not to mention 100 (though I'm the first to admit this particular implementation is exceptionally fast).

Eli Bendersky · Answer

Web pages are usually not large enough to need sophisticated search algorithms, at least on the first scan. I mean that you can probably find any word with a simple linear search in just a few ms. An optimization could be to build a trie during the first scan and then use it for subsequent searches.

Overall, I don't think this is one of the big issues in browser algorithms.

Algorithm used by browser for searching words in the webpage

Tags:

browser

algorithm

data-structures

Boolean

2 Answers

Jerry Coffin

Eli Bendersky

Recent Activity

Donate For Us

Algorithm used by browser for searching words in the webpage

Tags:

browser

algorithm

data-structures

Boolean

2 Answers

Jerry Coffin

Eli Bendersky

Related questions

Recent Activity

Donate For Us