Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count or Skip(1).Any() where I want to find out if there is more than 1 record - Entity Framework

I'm not sure when but I read an article on this which indicates that the usage of Skip(1).Any() is better than Count() compassion when using Entity Framework (I may remember wrong). I'm not sure about this after I saw the generated T-SQL code.

Here is the first option:

int userConnectionCount = _dbContext.HubConnections.Count(conn => conn.UserId == user.Id);
bool isAtSingleConnection = (userConnectionCount == 1);

This generates the following T-SQL code which is reasonable:

SELECT 
[GroupBy1].[A1] AS [C1]
FROM ( SELECT 
  COUNT(1) AS [A1]
    FROM [dbo].[HubConnections] AS [Extent1]
    WHERE [Extent1].[UserId] = @p__linq__0
)  AS [GroupBy1]

Here is the other option which is the suggested query as far as I remember:

bool isAtSingleConnection = !_dbContext
    .HubConnections.OrderBy(conn => conn.Id)
    .Skip(1).Any(conn => conn.UserId == user.Id);

Here is the generated T-SQL for the above LINQ query:

SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[UserId] AS [UserId]
        FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[UserId] AS [UserId], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
            FROM [dbo].[HubConnections] AS [Extent1]
        )  AS [Extent1]
        WHERE [Extent1].[row_number] > 1
    )  AS [Skip1]
    WHERE [Skip1].[UserId] = @p__linq__0
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 
    1 AS [C1]
    FROM ( SELECT [Extent2].[Id] AS [Id], [Extent2].[UserId] AS [UserId]
        FROM ( SELECT [Extent2].[Id] AS [Id], [Extent2].[UserId] AS [UserId], row_number() OVER (ORDER BY [Extent2].[Id] ASC) AS [row_number]
            FROM [dbo].[HubConnections] AS [Extent2]
        )  AS [Extent2]
        WHERE [Extent2].[row_number] > 1
    )  AS [Skip2]
    WHERE [Skip2].[UserId] = @p__linq__0
)) THEN cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1];

Which one is the proper way here? Is there a big performance difference between these two?

like image 557
tugberk Avatar asked Apr 24 '13 13:04

tugberk


1 Answers

Query performance depends on a lot of things, like the indexes that are present, the actual data, how stale the statistics about the data present are etc. SQL query plan optimizer looks at these different metrics to come up with an efficient query plan. So, any straightforward answer that says query 1 is always better than query 2 or the opposite would be incorrect.

That said, my answer below tries to explain the articles stance and how Skip(1).Any() could be better(marginally) than doing a Count() > 1. The second query though being bigger in size and mostly unreadable looks like it could be interpreted in an efficient fashion. Again, this depends on things aforementioned. The idea is that the number of rows that the database has to look into to figure out the result is more in case of Count(). In the count case, assuming that the required indexes are there (a clustered index on Id to make the OrderBy in second case efficient), the db has to go through count number of rows. In the second case, it has to go through a maximum of two rows to arrive at the answer.

Lets get more scientific in our analysis and see if my above theory holds any ground. For this, I am creating a dummy database of customers. The Customer type looks like this,

public class Customer
{
    public int ID { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
}

I am seeding the database with some 100K random rows(I really have to prove this) using this code,

    for (int j = 0; j < 100; j++)
    {
        using (CustomersContext db = new CustomersContext())
        {
            Random r = new Random();
            for (int i = 0; i < 1000; i++)
            {
                Customer c = new Customer
                {
                    Name = Guid.NewGuid().ToString(),
                    Age = r.Next(0, 100)
                };
                db.Customers.Add(c);
            }
            db.SaveChanges();
        }
    }

Sample code here.

Now, the queries that I am going to use are as follows,

db.Customers.Where(c => c.Age == 26).Count() > 1; // scenario 1

db.Customers.Where(c => c.Age == 26).OrderBy(c => c.ID).Skip(1).Any() // scenario 2

I have started SQL profiler to catch the query plans. The captured plans look as follows,

Scenario 1:

Check out the estimated cost and actual row count for scenario 1 in the above images. Scenario 1 - Estimated CostScenario 1 - Actual row count

Scenario 2:

Check out the estimated cost and actual row count for scenario 2 in the below images. Scenario 2 - Estimated CostScenario 2 - Actual row count

As per the initial guess, the estimated cost and the number of rows is lesser in the Skip and any case as compared to Count case.

Conclusion:

All this analysis aside, as many others have commented earlier, these are not the kind of performance optimizations you should try to do in your code. Things like these hurt readability with very minimal(I would say non-existent) perf benefit. I just did this analysis for fun and would never use this as a basis for choosing scenario 2. I would measure and see if doing a Count() is actually hurting to change the code to use Skip().Any().

like image 150
RaghuRam Nadiminti Avatar answered Nov 14 '22 07:11

RaghuRam Nadiminti