I am currently trying to create a stored procedure in T-SQL which takes an XML table as its input, and then inserts the data in it into a temporary table.
The XML that I am using has the following format:
<Table>
<row MyFirstColumn="foo" MySecondColumn="bar" ... />
</Table>
The SQL that I am using to insert this XML data into a temporary table is of the following format:
INSERT INTO
#TempTable
SELECT
T.c.value('@MyFirstColumn', 'varchar(50)')
,T.c.value('@MySecondColumn', 'varchar(50)')
,...
FROM
@x.nodes('//Table/row') T(c)
However, I am doing this with XML tables containing 150 columns and upwards of 200,000 rows. At present, executing this SQL on 10,000 rows takes ~142 seconds, so this is completely inappropriate for dealing with XML tables containing large numbers of rows.
Can anyone suggest a way to speed up this process?
Shredding XML with nodes()/value() in SQL Server has performance issues when you query a lot of columns. There is one nested loop join with a call to a xml function for each column.
Query plan with 3 columns:
Query plan with 5 columns:
Just imagine what it would look like with more than 150 columns.
Another option for you is to use OPENXML. It does not have the same problems with many columns.
Your query would look something like this:
declare @H int;
declare @X xml;
exec sys.sp_xml_preparedocument @H output,
@X;
select C1,
C2,
C3
from
openxml(@H, 'Table/row', 0)
with (
C1 int,
C2 int,
C3 int
);
exec sys.sp_xml_removedocument @H;
For me, using 150 columns and 1000 rows took about 14 seconds with nodes()/value() and 3 seconds with OPENXML.
Vote for a change.
Code used for testing;
drop table T;
go
declare @C int = 150;
declare @S nvarchar(max);
declare @X xml;
declare @N int = 1000;
declare @D datetime;
set @S = 'create table T('+
stuff((
select top(@C) ', '+N'C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N' int'
from sys.columns
for xml path('')
), 1, 2, '') + ')'
exec sp_executesql @S;
set @S = 'insert into T select top(@N) '+
stuff((
select top(@C) ',1'
from sys.columns as c1
for xml path('')
), 1, 1, '') + ' from sys.columns as c1, sys.columns as c2';
exec sp_executesql @S, N'@N int', @N;
set @X = (
select *
from dbo.T
for xml raw, root('Table')
);
set @S = 'select '+
stuff((
select top(@C) ', '+N'T.X.value(''@C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N''', ''int'')'
from sys.columns
for xml path('')
), 1, 2, '') + ' from @X.nodes(''Table/row'') as T(X)'
set @D = getdate();
exec sp_executesql @S, N'@X xml', @X;
select datediff(second, @D, getdate());
set @S = 'declare @H int;
exec sp_xml_preparedocument @H output, @X;
select *
from openxml(@H, ''Table/row'', 0)
with (' +
stuff((
select top(@C) ', C'+cast(row_number() over(order by 1/0) as nvarchar(3))+ ' int'
from sys.columns
for xml path('')
), 1, 2, '') + ');
exec sys.sp_xml_removedocument @H';
set @D = getdate();
exec sp_executesql @S, N'@X xml', @X
select datediff(second, @D, getdate());
I really liked and voted for Mikael Eriksson answer, but there's one aspect about it:
His test generates 909 KB XML document with 1000 rows, 150 columns. And sp_xml_preparedocument takes only 226 milliseconds in his case (which is really fast), but...
I tried applying it to my XML document which is 521 MB. It contains 2045156 rows with 11 different columns, all are read as nvarchar(255)
When I selected all 11 columns via *:
openxml works better in this case!
When I selected only 2 columns:
.value() works better in this case!
So it looks like which method is faster actually depends on xml size, number of rows and number of columns that you query from xml!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With