How to optimize this query?
// This will return data ranging from 1 to 500,000 records
List<string> products = GetProductsNames();
List<Product> actualProducts = (from p in db.Products
where products.Contains(p.Name)
select p).ToList();
This code takes around 30 seconds to fill actualProducts if I send a list of 44,000 strings, dont know what it takes for 500,000 records. :(
any way to tweak this query?
NOTE: it takes almost this much time for each call (ignoring the first slow edmx call)
An IN query on 500,000 records is always going to be a pathological case.
Firstly, make sure there is an index (probably non-clustered) on Name in the database.
Ideas (both involve dropping to ADO.NET):
INNER JOIN to the table-valued-parameter in TSQLProductQuery with columns QueryId (which could be uniqueidentifier) and Name; invent a guid to represent your query (Guid.NewGuid()), and then use SqlBulkCopy to push the 500,000 pairs (the same guid on each row; different guids are different queries) into the table really quickly; then use TSQL to do an INNER JOIN between the two tablesActually, these are very similar, but the first one is probably the first thing to try. Less to set up.
If you don't want to use Database you could try something with Dictionary<string,string>
If am not wrong I suspect products.Contains(p.Name) is expensive since it is O(n) operation. Try to change your GetProductsNames return type as Dictionary<string,string> or convert List to Dictionary
Dictionary<string, string> productsDict = products.ToDictionary(x => x);
So you have a dictionary in hand, now rewrite the query as below
List<Product> actualProducts = (from p in db.Products
where productsDict.ContainsKey(p.Name)
select p).ToList();
This will help you to improve performance a lot(disadvantage is you allocate double memory advantage is performance). I tested with very large samples with good results. Try it out.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With