Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Use Effeciently Where Clause or Select in LINQ Parallel in Large Dataset

Tags:

c#

linq

plinq

I'm having approx 250,000 records as marked as Boss, each Boss has 2 to 10 Staff. Daily I need to get the details of the Staff. Approx there are 1,000,000 staff. I'm using Linq to get the Unique list of Staff who are worked in daily basis. Consider the following C# LINQ and Models

void Main()
{

    List<Boss> BossList = new List<Boss>()
    {
        new Boss()
        {
            EmpID = 101,
            Name = "Harry",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
            {
                new Person() {EmpID = 102, Name = "Peter", Department = "Development",Gender = "Male"},
                new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},

            }
        },
        new Boss()
        {
            EmpID = 104,
            Name = "Raj",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
                    {
                        new Person() {EmpID = 105, Name = "Kaliya", Department = "Development",Gender = "Male"},
                        new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},

                    }
        },

        ..... ~ 250,000 Records ......

    };

    List<Person> staffList = BossList
    .SelectMany(x =>
        new[] { new Person { Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID } }
        .Concat(x.Employees))
    .GroupBy(x => x.EmpID) //Group by employee ID
    .Select(g => g.First()) //And select a single instance for each unique employee
    .ToList();
}

public class Person
{
    public int EmpID { get; set; }
    public string Name { get; set; }
    public string Department { get; set; }
    public string Gender { get; set; }
}

public class Boss
{
    public int EmpID { get; set; }
    public string Name { get; set; }
    public string Department { get; set; }
    public string Gender { get; set; }
    public List<Person> Employees { get; set; }
}

In the above LINQ I'm getting the List of Distinct Employees or Staff, the list contains more than 1,000,000 records. From the Obtained List I need to search "Raj"

staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));

For this operation, it took more than 3 to 5 minutes to get the result.

How could I make it more efficient. Kindly assist me...

like image 808
B.Balamanigandan Avatar asked Mar 09 '16 11:03

B.Balamanigandan


1 Answers

If you change Boss to inherit from Person ( public class Boss : Person ) not only do you not need to duplicate your properties in Person and Boss, you don't have to create all new Person instances for each Boss, because a Boss is already a Person:

IEnumerable<Person> staff = BossList 
    .Concat(BossList
        .SelectMany(x => x.Employees)
    )
    .DistinctBy(p => p.EmpId)
    .ToList()

Where DistinctByis defined as

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    var seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

Also, in your comparison, you're converting every Name to lowercase and doing the comparison - that's a lot of string creation that you don't need. Instead, try something like

staffList.Where(m => m.Name.Equals("Raj", StringComparison.InvariantCultureIgnoreCase));

Also, be aware that your use of Contains would also match names like Rajamussenand mirajii - possibly not what you were expecting.

like image 129
Scott Baker Avatar answered Nov 15 '22 00:11

Scott Baker