Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chrome extension history API not showing all results?

I am trying to use Chrome extension history API to get the history of the user according to the search term entered. But the search does not work correctly in some cases. For example when i enter the term "bi", no results are given but when i search "bit" some results are given but not all, i checked this by verifying it in chrome history search and it showed more results. Is this how the history API works or am i doing something wrong? Here is my code -

window.onload = function() {

function getHistory() {
  var list = document.getElementById('list');
  var box = document.getElementById("box").value;
  if (box === '') {
    list.innerHTML = '';
    list.innerHTML = list.innerHTML + 'Nothing To Search.';
  }
  else {
    var microseconds = 1000 * 60 * 60 * 24 * 365 * 45;
    var start = (new Date).getTime() - microseconds;
  chrome.history.search({text: box, startTime: 0, maxResults: 50000}, function(data) {
    if(Object.keys(data).length === 0) {
    list.innerHTML = '';
      list.innerHTML = list.innerHTML + 'Nothing Found.';
    }
    else {
      list.innerHTML = '';
        data.forEach(function(page) {
        list.innerHTML = list.innerHTML + '<li><p>'+page.title+'</p> <a href='+page.url+' target="_blank"><p>'+page.url+'</p></a></li> <hr>';
    });
   }
  });
 }
}

document.getElementById('search').onclick = getHistory;
}

Thank you.

like image 490
doctorsherlock Avatar asked Feb 07 '23 20:02

doctorsherlock


1 Answers

I'm seeing the same behaviour with an extension that I am writing. It is really quite annoying, so I went digging through the Chromium source code to find out what its really doing to match the history results.

Short answer: It seems from the source code that this behaviour is intended, so if we want to retrieve all matches to a text query we are stuck with retrieving all of the history results and searching for matches ourselves in JavaScript. On a side note, don't forget to double-check the start/end times, and make sure your 'maxResults' property is large enough, as mistaken values for any of these properties will likely give you unexpected results.

Long answer

DISCLAIMER: I don't have much C++ experience, so please correct my assessment if it is wrong.

The following function (in history_backend.cc) is eventually called after you call chrome.history.search with a non-empty text query.

    bool URLDatabase::GetTextMatchesWithAlgorithm(
    const base::string16& query,
    query_parser::MatchingAlgorithm algorithm,
    URLRows* results) {
  query_parser::QueryNodeVector query_nodes;
  query_parser_.ParseQueryNodes(query, algorithm, &query_nodes);

  results->clear();
  sql::Statement statement(GetDB().GetCachedStatement(SQL_FROM_HERE,
      "SELECT" HISTORY_URL_ROW_FIELDS "FROM urls WHERE hidden = 0"));

  while (statement.Step()) {
    query_parser::QueryWordVector query_words;
    base::string16 url = base::i18n::ToLower(statement.ColumnString16(1));
    query_parser_.ExtractQueryWords(url, &query_words);
    GURL gurl(url);
    if (gurl.is_valid()) {
      // Decode punycode to match IDN.
      base::string16 ascii = base::ASCIIToUTF16(gurl.host());
      base::string16 utf = url_formatter::IDNToUnicode(gurl.host());
      if (ascii != utf)
        query_parser_.ExtractQueryWords(utf, &query_words);
    }
    base::string16 title = base::i18n::ToLower(statement.ColumnString16(2));
    query_parser_.ExtractQueryWords(title, &query_words);

    if (query_parser_.DoesQueryMatch(query_words, query_nodes)) {
      URLResult info;
      FillURLRow(statement, &info);
      if (info.url().is_valid())
        results->push_back(info);
    }
  }
  return !results->empty();
}

The algorithm query_parser::MatchingAlgorithm passed into this function refers to the enum shown below (from query_parser.h), and is never explicitly set from what I can tell, so it will be the DEFAULT value.

enum class MatchingAlgorithm {
  // Only words long enough are considered for prefix search. Shorter words are
  // considered for exact matches.
  DEFAULT,
  // All words are considered for a prefix search.
  ALWAYS_PREFIX_SEARCH,
};

Read the comment above the DEFAULT option -

"Only words long enough are considered for prefix search. Shorter words are considered for exact matches"

The algorithm itself (query_parser.cc) breaks down your text query and the raw URL results into lists of "words" separated by spaces or punctuation, and checks for 'prefix matches' between each pair. This explains why if you have several pages in your history with the text "chromium" in the URL, you will get no results if you search for "hromium", but you'll get all of them if you search for "chro".

In your case, I think the search "bi" returns no results because the algorithm only looks for exact word matches for short terms, meaning that "bi" would need to be surrounded by white space or punctuation in the URL/title. This is confirmed if you do a google search for "bi", then query the history again for "bi". The google search history item will be matched since in the URL of the google search the "bi" is surrounded by punctuation and white space:

https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=bi

Sources

  • Chromium source code that is searchable
  • history_types.h - enum for algorithm
  • query_parser.cc - algorithm itself
  • history_service.cc - called from Javascript
  • history_backend.cc - called from history service
like image 132
JDune Avatar answered Feb 11 '23 00:02

JDune