Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Queries that include a column with large NVARCHAR values from from SQL Azure are slow

I have a table in an Azure database that has begun responding slowly to queries. The query looks like:

SELECT [Id] --nvarchar(128), PrimaryKey
      ,[Name] --nvarchar(max)
      ,[Description] --nvarchar(max)
      ,[Modified]  --datetime2(7)
      ,[LastModifiedBy] --nvarchar(max)
      ,[Opened]  --datetime2(7)
      ,[Editor] --nvarchar(max)
      ,[Json] --nvarchar(max)   <--THIS IS GIVING ME PROBLEMS
      ,[IsActive] --bit
  FROM [dbo].[TableName]

Specifically, when I include the [Json] column in the query, the SQL query performance goes from less than a second to minutes. Even requesting only a single record can take minutes when the [Json] column is included. This column contains long json-formatted strings (~500000 characters). The performance only breaks down when this column is included -- the other NVARCHAR(max) columns that contain smaller strings are not a problem.

I discovered this issue through performance issues of a MVC5 application using an Entity Framework linq-to-entities query:

var model=await db.TableName.FirstOrDefaultAsync(s => s.Id == id);

which produced a sql query looking like the one above. The Edit method for a single case which had run with no problems on a local development machine was taking minutes to load on the server. I then looked into direct db queries to see what the issue is and found the long query times.

This performance issue is not consistent across different methods of querying.

I have a turnaround time of 3 minutes with the following query:

SELECT Json FROM [dbo].[TableName] WHERE [Id]=<id>

The turnaround time is exponentially proportionate to the returned string length. For example, this query takes about 10 seconds:

SELECT SUBSTRING(Json,1,50000) FROM [dbo].[TableName] WHERE [Id]=<id>

Queries on the server like the following take less than a second.:

DECLARE @variable nvarchar(max);
SELECT @variable=Json FROM [dbo].[TableName] where Id='<id>';
SELECT LENGTH(@variable);

but actually retrieving the data as in the following takes me back up to several minutes:

DECLARE @variable nvarchar(max);
SELECT @variable=Json FROM [dbo].[TableName] where Id='<id>';
SELECT @variable;

My ultimate goal is figuring out how to get Entity Framework's linq-to-entities query to perform at a reasonable speed so I can use the data in C#, and I do not think I can force EF to produce such a query dynamically.

I have never encountered this difficulty before with other tables storing large strings. Is there a setting I have mistakenly set incorrectly, or is there a best practice for building EF linq-to-sql statements in this situation?

For comparison, there is no performance issue when running the queries on a local instance of SQL Server with a copy of the same database; all queries return in less than a second.

-UPDATE-

I have been monitoring, and this issue has disappeared without any code change. All query response times are back to under a second. However, there was also no notification of service outage from Azure. In fact, throughout the duration of the issue, the database was completely accessible, and the only issues was the slow queries involving fields returning large string values.

The downside is that I can no longer reproduce the issue.

For others with this issue on Azure (which appears to be irregular), the diagnostic symptoms of this behavior are:

  1. no Azure-SQL Server outage
  2. healthy response times for queries that do not return large string values
  3. normal resource usage of the Azure-SQL Server for any queries, even if they return large string values
  4. In dependent applications, two types of errors are thrown by the connecting application: a) connection timeout errors, and b) closed connection errors. There is no contextual information that distinguishes when either type of error gets thrown.
  5. The response time for queries involving long strings is exponential to the length of the string returned. For example, 10 characters is instantaneous, 50000 takes 10 seconds, 500000 takes minutes, etc. However, the response times are not consistent.
  6. String processing (even on very long strings) performed entirely on the server that does NOT require returning a long string value takes a normal amount of time.

I will leave this question open in case someone has an actual answer, but it appears that the solution was to simply wait for Azure to resolve whatever they were modifying about their request handling. It appears that the issue related to the transfer of data from Azure DB rather than the processing on the server. My top recommendation is to not tear apart code that is working flawlessly on the development box if the issue is characterized by the symptoms noted above.

like image 771
longestwayround Avatar asked Aug 23 '16 01:08

longestwayround


People also ask

What is the limitation of Azure SQL Database currently?

Some of the database and T-SQL limitations in Azure SQL are: No Windows authentication on Azure SQL. No Extended stored procedures and table partitioning. No support for database mirroring and failover clustering.

Is it better to use varchar or NVARCHAR?

Today's development platforms or their operating systems support the Unicode character set. Therefore, In SQL Server, you should utilize NVARCHAR rather than VARCHAR. If you do use VARCHAR when Unicode support is present, then an encoding inconsistency will arise while communicating with the database.

What is the max NVARCHAR for SQL Server?

nvarchar [ ( n | max ) ] n defines the string size in byte-pairs, and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^30-1 characters (2 GB). The storage size is two times n bytes + 2 bytes.

Does varchar Max affect performance?

First off, to answer your question: Yes, it can affect performance, as space needs to be allocated to hold large values in the query engine. In your case, you could also use a suitably large size such as varchar(50) which would easily hold whatever you needed.


1 Answers

It really is too bad you can't reproduce this error, because I have a thought on what your issue is, and it all comes from your statement "when I include the [Json] column in the query, the SQL query performance goes from less than a second to minutes." You also gave me the clue:

Queries on the server like the following take less than a second.:

DECLARE @variable nvarchar(max);
SELECT @variable=Json FROM [dbo].[TableName] where Id='<id>';
SELECT LENGTH(@variable);

When retrieving data is the issue, you've got a problem with the the wait type: ASYNC_NETWORK_IO. Basically sending the data out of the sql server and into the application waiting is the problem.

I will ask where the application is running. Is the application in azure in the same data center as the database, or is it running from outside that data center. The closer you can get your application to the data, the less you will see this wait type.

The other question I will ask is about the hardware running the application, does it have sufficient memory to receive all that data, and then process it in a useful way. Occasionally the network wait is actually a problem with under powered hardware on the application side.

I have a couple additional thoughts to share: If you're going to deal with very large JSON objects, have you considered using DocumentDB to store them, instead of Azure SQL Database? It's optimized for that kind of work load, plus you can now write T-SQL to query JSON files stored in DocumentDB.

Regarding:

In dependent applications, two types of errors are thrown by the connecting application: a) connection timeout errors, and b) closed connection errors. There is no contextual information that distinguishes when either type of error gets thrown.

You're going to need some kind of retry policy in any application hitting a database. It's critical when dealing with Azure SQL Database, since you have three copies of your database at any given time, and you may need to suffer a fail over in the middle of the day. If you have that retry policy, end users will never know there was a problem, since the secondary copies are available in under a second.

I hope this helps!

like image 175
Shannon Lowder Avatar answered Oct 25 '22 04:10

Shannon Lowder