Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flattening hierarchical XML in SQL using the nodes() method

I have a Stored Procedure that takes an XML document as a parameter similar in a structure to the following:

<grandparent name="grandpa bob">
  <parent name="papa john">
    <children>
      <child name="mark" />
      <child name="cindy" />
    </children>
  </parent>
  <parent name="papa henry">
    <children>
      <child name="mary" />
    </children>
  </parent>
</grandparent>

My requirement is to "flatten" this data so that it can be inserted into a temporary table and manipulated further down the procedure, so the above XML becomes:

Grandparent Name Parent Name     Child Name
---------------- --------------- ---------------
grandpa bob      papa john       mark
grandpa bob      papa john       cindy
grandpa bob      papa henry      mary

This is currently being done using SQL Server XML Nodes:

SELECT
    VIRT.node.value('../../../@name','varchar(15)') 'Grandparent Name',
    VIRT.node.value('../../@name','varchar(15)') 'Parent Name',
    VIRT.node.value('@name','varchar(15)') 'Child Name'
FROM
    @xmlFamilyTree.nodes('/grandparent/parent/children/child') AS VIRT(node)

This works great until I start throwing large amounts of data at the procedure (i.e. 1000+ child nodes), at which point this grinds to a halt and takes between 1 and 2 minutes to execute. I think this may be due to the fact that I am starting off at the lowest level (<child) and then traversing back up the XML document for each occurance. Would splitting this single query into 3 chunks (one per node that I need data from) improve performance here? Given that none of these nodes have "keys" on them that I could use to join back up with, could anyone offer any pointers how I might be able to go about doing this?

like image 808
Matt Weldon Avatar asked Apr 27 '11 07:04

Matt Weldon


People also ask

What is XML node in SQL Server?

The nodes() method is useful when you want to shred an xml data type instance into relational data. It allows you to identify nodes that will be mapped into a new row.

What is XML format in SQL Server?

SQL Server provides an XML schema that defines syntax for writing XML format files to use for bulk importing data into a SQL Server table. XML format files must adhere to this schema, which is defined in the XML Schema Definition Language (XSDL).


1 Answers

I seem to have answered my own question after a bit more looking around online:

SELECT
    grandparent.gname.value('@name', 'VARCHAR(15)'),
    parent.pname.value('@name', 'VARCHAR(15)'),
    child.cname.value('@name', 'VARCHAR(15)')
FROM
    @xmlFamilyTree.nodes('/grandparent') AS grandparent(gname)
CROSS APPLY
    grandparent.gname.nodes('*') AS parent(pname)
CROSS APPLY
    parent.pname.nodes('children/*') AS child(cname)

Using CROSS APPLY I can select the top-level grandparent node and use this to select the child parent nodes and so on. Using this method I have taken my query from executing in around 1 minute 30 seconds down to around 6 seconds.

Interestingly though, if I use the "old" OPEN XML method to retrieve the same data, the query executes in 1 second!

It seems like you may have to approach the use of these two techniques on a case-by-case basis depending on the expected size/complexity of the document being passed in.

like image 86
Matt Weldon Avatar answered Oct 09 '22 01:10

Matt Weldon