Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster way than this to extract data from XML nodes in T-SQL?

I am currently trying to create a stored procedure in T-SQL which takes an XML table as its input, and then inserts the data in it into a temporary table.

The XML that I am using has the following format:

<Table>
    <row MyFirstColumn="foo" MySecondColumn="bar" ... />
</Table>

The SQL that I am using to insert this XML data into a temporary table is of the following format:

INSERT INTO
    #TempTable
SELECT
    T.c.value('@MyFirstColumn', 'varchar(50)')
   ,T.c.value('@MySecondColumn', 'varchar(50)')
   ,...
FROM
    @x.nodes('//Table/row') T(c)

However, I am doing this with XML tables containing 150 columns and upwards of 200,000 rows. At present, executing this SQL on 10,000 rows takes ~142 seconds, so this is completely inappropriate for dealing with XML tables containing large numbers of rows.

Can anyone suggest a way to speed up this process?

like image 695
J. Chapman Avatar asked Sep 18 '18 17:09

J. Chapman


2 Answers

Shredding XML with nodes()/value() in SQL Server has performance issues when you query a lot of columns. There is one nested loop join with a call to a xml function for each column.

Query plan with 3 columns:

enter image description here

Query plan with 5 columns:

enter image description here

Just imagine what it would look like with more than 150 columns.

Another option for you is to use OPENXML. It does not have the same problems with many columns.

Your query would look something like this:

declare @H int;
declare @X xml;

exec sys.sp_xml_preparedocument @H output,
                                @X;

select C1,
       C2,
       C3
from
       openxml(@H, 'Table/row', 0)
       with (
              C1 int,
              C2 int,
              C3 int
            );

exec sys.sp_xml_removedocument @H;

For me, using 150 columns and 1000 rows took about 14 seconds with nodes()/value() and 3 seconds with OPENXML.

Vote for a change.

Code used for testing;

drop table T;

go

declare @C int = 150;
declare @S nvarchar(max);
declare @X xml;
declare @N int = 1000;
declare @D datetime;

set @S = 'create table T('+
stuff((
      select top(@C) ', '+N'C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N' int'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ')'

exec sp_executesql @S;

set @S = 'insert into T select top(@N) '+
stuff((
      select top(@C) ',1'
      from sys.columns as c1
      for xml path('')
      ), 1, 1, '') + ' from sys.columns as c1, sys.columns as c2';

exec sp_executesql @S, N'@N int', @N;

set @X = (
         select *
         from dbo.T
         for xml raw, root('Table')
         );

set @S = 'select '+
stuff((
      select top(@C) ', '+N'T.X.value(''@C'+cast(row_number() over(order by 1/0) as nvarchar(3)) + N''', ''int'')'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ' from @X.nodes(''Table/row'') as T(X)'

set @D = getdate();
exec sp_executesql @S, N'@X xml', @X;
select datediff(second, @D, getdate());

set @S = 'declare @H int;
exec sp_xml_preparedocument @H output, @X;

select *
from openxml(@H, ''Table/row'', 0)
  with (' +
stuff((
      select top(@C) ', C'+cast(row_number() over(order by 1/0) as nvarchar(3))+ ' int'
      from sys.columns
      for xml path('')
      ), 1, 2, '') + ');
exec sys.sp_xml_removedocument @H';

set @D = getdate();
exec sp_executesql @S, N'@X xml', @X
select datediff(second, @D, getdate());
like image 129
Mikael Eriksson Avatar answered Sep 18 '22 00:09

Mikael Eriksson


I really liked and voted for Mikael Eriksson answer, but there's one aspect about it:

His test generates 909 KB XML document with 1000 rows, 150 columns. And sp_xml_preparedocument takes only 226 milliseconds in his case (which is really fast), but...

I tried applying it to my XML document which is 521 MB. It contains 2045156 rows with 11 different columns, all are read as nvarchar(255)

When I selected all 11 columns via *:

  • select * via .value() took 297 sec
  • select * via openxml took 231 sec in total: (sp_xml_preparedocument took 107 sec, select * from openxml took 123 sec)

openxml works better in this case!

When I selected only 2 columns:

  • select 2 columns via .value() took 57 sec
  • select 2 columns via openxml took 189 sec in total: (sp_xml_preparedocument - 86 sec, select * from openxml - 103 sec)

.value() works better in this case!

So it looks like which method is faster actually depends on xml size, number of rows and number of columns that you query from xml!

like image 27
Dmitriy Grishin - dogrishin Avatar answered Sep 19 '22 00:09

Dmitriy Grishin - dogrishin