Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement text index in C# memory

Tags:

memory

caching

I have a performance sensitive task and I am considering storing of all objects which is about 100,000 items in memory. (persistent at ms sql, but copy in memory to improve complex search performance)

Search by key works fast enough, but search by text, eg. Contains is relatively slow - it takes about 30ms per each query like this:

IEnumerable<Product> result =
   products.Where(p =>
   p.Title.Contains(itemnames[rnd.Next(itemnames.Length)]));

I already tried to use memory database db4o but it performance is even worse - about 1.5 seconds per search in 100K items.

What's options in order not to review every object Title and perform this faster?

What memory database can i use to solve this task?

like image 680
st78 Avatar asked Nov 06 '22 02:11

st78


1 Answers

Do you have the option of changing the data structure in which your products are stored? One way you can speed up your Contains search is to store every possible Product.Title substring in a Dictionary<string, List<Product>>. This will allow your search to be O(1) instead of O(n).

You can generate every substring like so:

public static IEnumberable<string> AllSubstrings(this string value)
{
    int index = 0;
    while(++index <= value.Length)
    {
        yield return value.Substring(0, index);
    }

    index = 0;
    while(++index <= value.Length - 1)
    {
        yield return value.Substring(index);
    }
}

Then you can populate your dictionary like so:

var titleIndex = new Dictionary<string, List<Product>>();

foreach(Product product in products)
{
    foreach(string substring in product.Title.AllSubstrings())
    {
        if(titleIndex.ContainsKey(substring))
        {
            index[substring].Add(product);
        }
        else
        {
            index[substring] = new List<Product> { product };
        }
    }
}

And lastly, you perform your search like so:

string searchString = itemnames[rnd.Next(itemnames.Length)];

if(titleIndex.ContainsKey(searchString))
{
    List<Product> searchResults = titleIndex[searchString];
}

Note: As you may have guessed, storing your data like this takes more CPU time up front and uses more RAM.

like image 159
tdaines Avatar answered Nov 09 '22 10:11

tdaines