Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High-speed string matching in C#

Tags:

string

c#

search

I have a list of about 10,000 staff members in a List<T> and I have a ListBox which contains a subset of those staff, depending on the search term in a text box.

Say a Staff object has the following publicly exposed properties:

string FirstName
string LastName
string MiddleName
   int StaffID
   int CostCentre

I could write a function like this:

bool staffMatchesSearch(Staff stf)
{
  if (tbSrch.Text.Trim() == string.Empty)
    return true; // No search = match always.

  string s = tbSrch.Text.Trim().ToLower();

  // Do the checks in the order most likely to return soonest:
  if (stf.LastName.ToLower().Contains(s))
    return true;
  if (stf.FirstName.ToLower().Contains(s))
    return true;
  if (stf.MiddleName.ToLower().Contains(s))
    return true;

  if (stf.CostCentre.ToString().Contains(s))
    return true; // Yes, we want partial matches on CostCentre
  if (stf.StaffID.ToString().Contains(s))
    return true; // And also on StaffID

  return false;
}

and then do something like:

tbSrch_TextChanged(object sender, EventArgs e)
{
  lbStaff.BeginUpdate();
  lbStaff.Items.Clear();

  foreach (Staff stf in staff)
    if (staffMatchesSearch(stf))
      lbStaff.Items.Add(stf);

  lbStaff.EndUpdate();
}

The filtering is re-evaluated every time the user changes the contents of the tbSrch box.

This works, and it's not awfully slow, but I was wondering if I could make it any faster?

I have tried to re-write the whole thing to be multi-threaded, however with only 10,000 staff members the overhead seemed to take away the bulk of the benefit. Also, there were a bunch of other bugs like if searching for "John", the user first presses "J" which spools up the threads, but when the user presses the "o" another set are spooled up before the first lot have had a chance to return their results. A lot of the time, the results get returned in a jumbled order and all sorts of nasty things happen.

I can think of a few tweaks that would make the best-case scenario significantly better, but they would also make the worst-case scenario a lot worse.

Do you have any ideas on how this can be improved?


Great suggestions I've implemented far, and their results:

  • Add a delay on the ValueChanged event so that if the user types a 5-character name quickly on the keyboard, it only performs 1 search at the end rather than 5 in series.
  • Pre-evaluate ToLower() and store in the Staff class (as a [NonSerialized] attribute so it doesn't take up extra space in the save file).
  • Add a get property in Staff which returns all the search criteria as a single, long, lower-case, concatenated string. Then run a single Contains() on that. (This string is stored in the Staff object so it only gets constructed once.)

So far, these have lowered search times from around 140ms to about 60ms (though these numbers are highly subjective depending on the actual search performed and number of results returned).

like image 266
Ozzah Avatar asked Nov 16 '11 03:11

Ozzah


1 Answers

1) as pointed out in the comments, you probably shouldn't .ToString the numeric fields - just match the numbers

2) the ToLower calls are a perf drain. Add lowercased versions of these properties to the Staff class so the ToLower only has to be done once

3) when the user enters another character, you don't need to reevaluate all items. entering a character will only reduce the number of matches so only reevaluate the previous matches.

4) use a timer to introduce a delay between when the user types and when you kick off a search. if they are typing multiple characters quickly, you might as well wait until they've paused for half a second

5) Check the key pressed. If NaN then don't check the int properties.

like image 115
2 revs, 2 users 86% Avatar answered Sep 25 '22 07:09

2 revs, 2 users 86%