Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq select match list from other list performance

I have a class like this

public class Category 
{
   CategoryID
   List<Product> ProductList
}
public class Product 
{
   CategoryID
}

List<Category> around 15k row and List <Product> around 40k row. I'm doing with LINQ like this

CategoryList.ForEach(i =>
{
    i.ProductList = ProductList.Where(x => x.CategoryID == i.CategoryID).ToList();
});

I want to know there are any implement better performance for this. Data could increase or decrease in other case

like image 610
user1392853 Avatar asked Aug 17 '15 06:08

user1392853


2 Answers

I think it's silly to use parallelism, when a simple change to the algorithm can speed it up thousandfold. Assuming CategoryID has a decent implementation of GetHashCode() a dictionary/lookup has nearly O(1) lookup times compared to the O(n) time of scanning through a list.

Two possible approaches:

Turn categories into a dictionary

Assuming categories have unique IDs you can use:

var categoryById = categories.ToDictionary(c => c.CategoryID);
foreach(var product in products)
{
    var category = categoryById[product.CategoryID];
    category.Products.Add(product);
}

This has runtime O(products.Count + categories.Count) and uses O(categories.Count) memory.

If categories don't start out with an empty list, you might need to create it.

Turn products into a Lookup

var productsByCategory = products.ToLookup(product => product.CategoryID);
foreach(var category in categories)
{
    category.Products = products[category.CategoryID].ToList();
}

This has runtime O(products.Count + categories.Count) and uses O(products.Count) memory.

Since there are typically more products than categories, this approach will require more memory. On the other hand the lookup might remove the need to embed a list of products in the category object.

like image 98
CodesInChaos Avatar answered Oct 31 '22 14:10

CodesInChaos


You can use parallel implementation of LINQ - PLINQ. Usually PLINQ increases the speed of LINQ, because it's using all available cores more efficiently.

CategoryList.AsParallel().ForAll(i =>
     {
         i.ProductList = ProductList.AsParallel.Where(x => x.CategoryID == i.CategoryID).ToList();
     });
like image 26
Artem Kulikov Avatar answered Oct 31 '22 14:10

Artem Kulikov