Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Firefox's 'awesome' bar match strings?

The question is how the string matching is done to find matching entries by the firefox 3 url bar. Substring matching on every entry might be slow. What algorithm can be used to match at any location in a fast way?

like image 553
amit Avatar asked Feb 12 '09 10:02

amit


1 Answers

The algorithm the location bar in Firefox 3.0 is bit complicated. It will get data from two (three for Firefox 3.5 and later) different queries:

  • For the first query, it checks the moz_inputhistory table to see if the current search string is stored in that table. These results are sorted by "rank" which is a number that determines how recently it is used. This number degrades once a day. This search is what makes the location bar adaptive to what you select over time (implemented in bug 95739).

  • In Firefox 3.5 and later, it then checks the moz_keyword table for bookmarks with a keyword that match the search text exactly.

  • The last query, it goes through every entry in moz_places, which will include all history visits and bookmarks. These results are ordered by frecency.

For all three of these cases, the following algorithm is used for matching against the tags, title, and the url (called "searchable text" below). This is a bit difficult to explain in words, so it might be easier to look at the source code.

  1. The search string is broken into tokens determined by whitespace (each non-whitespace "word" is a token).
  2. For each token, start comparing each character of the searchable text and the token in a Unicode, case-insensitive way until it matches completely. If one set of characters do not match, advance to the next "word boundary" in the searchable text and try again.
  3. If we match the any one of the searchable text, it will show up in the location bar.
  4. If we do not have enough results (default is 12), we will then redo the search of the query going through every bookmark and history visit, and test the searchable text in a Unicode, case-insensitive way for each token anywhere (not just at word boundaries).

Hopefully that explains it in an understandable way!

like image 156
sdwilsh Avatar answered Sep 19 '22 14:09

sdwilsh