Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In SQL Server, deleting nodes using .modify() XQuery taking 38 minutes to execute

In SQL Server, I have a stored procedure with XML type temporary variable, and I am doing a delete operation on that variable. When I am running this stored procedure in my local VM which has 4 cores and 6 GB RAM, it takes 24 seconds to execute. But when I am running the same stored procedure in server with 40 cores and 128 GB RAM, this delete statement is taking more than 38 minutes to execute. The whole stored procedure gets hanged at this delete statement for 38 minutes. After commenting out the delete statement, the stored procedure executes in 8 seconds on the server. How I can fix this performance issue. Is there something wrong in ther SQL server configuration?

DECLARE @PaymentData AS XML

SET @PaymentData = .....(Main XML Query)

SET @PaymentData.modify('delete //*[not(node())]')

@Mikael: Below is the execution plan for shredding into rows solution on server (with 40 cores and 128 GB RAM) enter image description hereenter image description here And Below is the excecution plan in my local VM(with 4 cores and 6 GB RAM): enter image description hereenter image description here

like image 461
Rakesh Avatar asked May 04 '15 06:05

Rakesh


1 Answers

On my machine the delete took 1 hour 25 minutes and gave me this not so pretty query plan.

enter image description here

This plan finds all empty nodes (the ones to be deleted) and stores those in a Table Spool. Then for each node in the entire document there is a check if that node is present in the spool (Nested Loops (Left Semi Join)) and if it is that node is excluded from the final result (Merge join (Left Anti Semi Join)). The xml is then rebuilt from the nodes in the UDX operator and assigned to the variable. The table spool is not indexed so for each node that needs to be checked there will be a scan of the entire spool (or until a match is found).

That essentially means the performance of this algorithm is O(n*d) where n is the total number of nodes and d is the total number or deleted nodes.

There are a couple of possible workarounds.

First and perhaps the best is if you could modify your XML query to not produce the empty nodes in the first place. Totally possible if you create the XML with for xml and perhaps not possible if you already have parts of the XML stored in a table.

Another option is to shred the XML on Row (see sample XML below), put the result in a table variable, modify the XML in the table variable and then recreate the combined XML.

declare @T table(PaymentData xml);

insert into @T 
select T.X.query('.')
from @PaymentData.nodes('Row') as T(X);

update @T
set PaymentData.modify('delete //*[not(node())]');

select T.PaymentData as '*'
from @T as T
for xml path('');

enter image description here

This will give you the performance characteristic of O(n*s*d) where n is the number of row nodes, s is the number of sub-nodes per row node and d is the number of deleted rows per row node.

A third option that I really can't recommend is to use an undocumented trace flag that removes the use of a spool in the plan. You can try it out in test or you can perhaps capture the plan generated and use it in a plan guide.

declare @T table(PaymentData xml);

insert into @T values(@PaymentData);

update @T 
set PaymentData.modify('delete //*[not(node())]')
option (querytraceon 8690);

select @PaymentData = PaymentData
from @T;

Query plan with trace flag:

enter image description here

Instead of 1 hour 25 minutes, this version took 4 seconds on my computer.

Shredding the XML to multiple rows to the table variable took in total 6 seconds to execute.

Not having to delete any rows at all is of course the fastest.

Sample data, 12000 nodes with 32 subnodes where 2 is empty if you want to try this at home.

declare @PaymentData as xml;

set @PaymentData = (
                   select top(12000) 
                     1 as N1, 1 as N2, 1 as N3, 1 as N4, 1 as N5, 1 as N6, 1 as N7, 1 as N8, 1 as N9, 1 as N10, 
                     1 as N11, 1 as N12, 1 as N13, 1 as N14, 1 as N15, 1 as N16, 1 as N17, 1 as N18, 1 as N19, 1 as N20, 
                     1 as N21, 1 as N22, 1 as N23, 1 as N24, 1 as N25, 1 as N26, 1 as N27, 1 as N28, 1 as N29, 1 as N30,
                     '' as N31,
                     '' as N32
                   from sys.columns as c1, sys.columns as c2
                   for xml path('Row')
                   );

Note: I have no idea why it only took 24 seconds to execute on one of your servers. I would advise you to recheck that the XML actually is identical. Or why not test using the XML sample I have provided for you.

Update:

For the shredding version the problem with the spool in the delete query could be moved to the shredding query instead leaving you with about the same bad performance. That is however not always true. I have seen plans where there is no spool and plans where there are a spool and I don't know why it is there sometimes and why it is not at other times.

I have also found that if you use a temp table instead with insert ... into I don't get the spool in the shredding query.

select T.X.query('.') as PaymentData
into #T
from @PaymentData.nodes('Row') as T(X);

update #T
set PaymentData.modify('delete //*[not(node())]');
like image 96
Mikael Eriksson Avatar answered Oct 17 '22 10:10

Mikael Eriksson