Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow SQL data retrieval with SqlDataReader.Read() in C# vs SSMS

I am doing a simple SQL query to get lots of data. The complexity of the query is not an issue. It takes around 200ms to execute. However the amount of data seems to be the issue.

We retrieve around 40k rows. Each row has 8 columns and the amount of data is around a few hundreds of kbytes per row. Say, we download 15megs in total for this query.

What boggles my mind is that: when I execute the query from a basic C# code it takes 1min and 44secs. But when I do it from SSMS it takes 10 secs. Of course I do this from the same machine, and I'm using the same database. And I clearly see the UI and the rows being populated in realtime. In 10secs the whole data table is full.

We tried:

  • to set the same SET things as the ones from SSMS,
  • to change the transaction isolation level,
  • to ignore the execution plan (with the OPTION(RECOMPILE)),
  • to ignore locks (with the WITH(NOLOCK)).

It doesn't change anything. Makes sense: it's the read that is slow. Not the query (IMHO).

It is the while(reader.Read()) that takes time. And, we tried with an empty while loop. So this excludes boxing/unboxing stuff or putting the result in memory.

Here is a test program we made to figure out it was the Read() that is taking time:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Threading.Tasks;
using System.Transactions;

namespace SqlPerfTest
{
    class Program
    {
        const int GroupId = 1234;
        static readonly DateTime DateBegin = new DateTime(2017, 6, 19, 0, 0, 0, DateTimeKind.Utc);
        static readonly DateTime DateEnd = new DateTime(2017, 10, 20, 0, 0, 0, DateTimeKind.Utc);
        const string ConnectionString = "CENSORED";

        static void Main(string[] args)
        {
            TransactionOptions transactionOptions = new TransactionOptions
            {
                IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted
            };

            using (var transactionScope = new TransactionScope(TransactionScopeOption.Required, transactionOptions))
            {

                using (SqlConnection connection = new SqlConnection(ConnectionString))
                {
                    connection.Open();

                    SetOptimizations(connection);
                    ShowUserOptions(connection);
                    DoPhatQuery(connection).Wait(TimeSpan.FromDays(1));
                }
                transactionScope.Complete();
            }
        }

        static void SetOptimizations(SqlConnection connection)
        {
            SqlCommand cmd = connection.CreateCommand();
            Console.WriteLine("===================================");

            cmd.CommandText = "SET QUOTED_IDENTIFIER ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET ANSI_NULL_DFLT_ON ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET ANSI_PADDING ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET ANSI_WARNINGS ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET ANSI_NULLS ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET CONCAT_NULL_YIELDS_NULL ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET ARITHABORT ON";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET DEADLOCK_PRIORITY -1";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET QUERY_GOVERNOR_COST_LIMIT 0";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);

            cmd.CommandText = "SET TEXTSIZE 2147483647";
            cmd.ExecuteNonQuery();
            Console.WriteLine(cmd.CommandText);
        }

        static void ShowUserOptions(SqlConnection connection)
        {
            SqlCommand cmd = connection.CreateCommand();
            Console.WriteLine("===================================");

            cmd.CommandText = "DBCC USEROPTIONS";
            using (SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.SequentialAccess))
            {
                Console.WriteLine(cmd.CommandText);

                while (reader.HasRows)
                {
                    while (reader.Read())
                    {
                        Console.WriteLine("{0} = {1}", reader.GetString(0), reader.GetString(1));
                    }
                    reader.NextResult();
                }
            }
        }

        static async Task DoPhatQuery(SqlConnection connection)
        {
            Console.WriteLine("===================================");

            SqlCommand cmd = connection.CreateCommand();
            cmd.CommandText =
                @"SELECT
                    p.[Id],
                    p.[UserId],
                    p.[Text],
                FROM [dbo].[Post] AS p WITH (NOLOCK)
                WHERE p.[Visibility] = @visibility
                    AND p.[GroupId] = @groupId
                    AND p.[DatePosted] >= @dateBegin
                    AND p.[DatePosted] < @dateEnd
                ORDER BY p.[DatePosted] DESC
                OPTION(RECOMPILE)";
            cmd.Parameters.Add("@visibility", SqlDbType.Int).Value = 0;
            cmd.Parameters.Add("@groupId", SqlDbType.Int).Value = GroupId;
            cmd.Parameters.Add("@dateBegin", SqlDbType.DateTime).Value = DateBegin;
            cmd.Parameters.Add("@dateEnd", SqlDbType.DateTime).Value = DateEnd;
            Console.WriteLine(cmd.CommandText);
            Console.WriteLine("===================================");

            DateTime beforeCommit = DateTime.UtcNow;
            using (SqlDataReader reader = await cmd.ExecuteReaderAsync(CommandBehavior.CloseConnection))
            {
                DateTime afterCommit = DateTime.UtcNow;
                Console.WriteLine("Query time = {0}", afterCommit - beforeCommit);

                DateTime beforeRead = DateTime.UtcNow;
                int currentRow = 0;
                while (reader.HasRows)
                {
                    while (await reader.ReadAsync())
                    {
                        if (currentRow++ % 1000 == 0)
                            Console.WriteLine("[{0}] Rows read = {1}", DateTime.UtcNow, currentRow);
                    }
                    await reader.NextResultAsync();
                }

                Console.WriteLine("[{0}] Rows read = {1}", DateTime.UtcNow, currentRow);

                DateTime afterRead = DateTime.UtcNow;
                Console.WriteLine("Read time = {0}", afterRead - beforeRead);
            }
        }
    }
}

As you can see above, we reproduce the same SET stuff as the ones from SSMS. We also tried all the tricks known to mankind to speed up everything. Using async stuff. Using WITH(NOLOCK), NO RECOMPILE, defining a bigger PacketSize in the connection string didn't help, and using Sequential Reader. Still, SSMS is 50 times faster.

More info

Our database is an Azure database. We actually have 2 databases, one in Europe and one in West US. Since we are located in Europe, the same query is faster when we use the Europe database. But it's still like 30secs and is like instant in SSMS. The data transfer speed does influence this, but it's not the main issue.

We can also reduce the time of the data transfer by projecting less columns. It does quickens the Read() iteration of course. Say we retrieve only our ID column: then we have a while(Read()) that lasts 5secs. But it's not an option as we need all these columns.

We know how to 'solve' this issue: we can approach our problem differently, and make small queries daily and cache these results in an Azure Table or something. But we want to know WHY SSMS is faster. What's the trick.

We used Entity Framework in C#, Dapper in C# and the example above is like native C#. I have seen a few people in the interwebz with potentially a similar issue. To me, it feels like it's the SqlDataReader that is slow. Like, it doesn't pipeline the download of the rows using multiple connections or something.

Question

So my question here is this: how the hell does Management Studio manages to be 50 times faster to download the result of our query? What's the trick?

Thanks guys.

like image 606
Kevin Avatar asked Oct 20 '17 16:10

Kevin


People also ask

Why do I have to call read to open a sqldatareader?

Therefore, you must call Read to begin accessing any data. Only one SqlDataReader per associated SqlConnection may be open at a time, and any attempt to open another will fail until the first one is closed. Similarly, while the SqlDataReader is being used, the associated SqlConnection is busy serving it until you call Close.

What happens to sqldatareader while it is being used?

Similarly, while the SqlDataReader is being used, the associated SqlConnection is busy serving it until you call Close.

What is the default position of the sqldatareader?

The default position of the SqlDataReader is before the first record. Therefore, you must call Read to begin accessing any data. Only one SqlDataReader per associated SqlConnection may be open at a time, and any attempt to open another will fail until the first one is closed.

How many sqldatareader can be open at a time?

Only one SqlDataReader per associated SqlConnection may be open at a time, and any attempt to open another will fail until the first one is closed. Similarly, while the SqlDataReader is being used, the associated SqlConnection is busy serving it until you call Close.


1 Answers

What boggles my mind is that: when I execute the query from a basic C# code it takes 1min and 44secs. But when I do it from SSMS it takes 10 secs

You can't execute a parameterized query directly in SSMS so you're comparing different things. When you use local variables instead of parameters in SSMS, SQL Server estimates row counts using overall average density statistics. With a parameterized query, SQL Server uses the statistics histogram and supplied parameter values for initial compilation. Different estimates can result in different plans, although the estimates from the histogram are usually more accurate and yield a better plan (theoretically).

Try updating statistics and executing the query from SSMS using sp_executesql and parameters. I would expect the same performance as the app code, good or bad.

like image 129
Dan Guzman Avatar answered Oct 21 '22 01:10

Dan Guzman