I have a Stored Procedure that takes an XML document as a parameter similar in a structure to the following:
<grandparent name="grandpa bob">
<parent name="papa john">
<children>
<child name="mark" />
<child name="cindy" />
</children>
</parent>
<parent name="papa henry">
<children>
<child name="mary" />
</children>
</parent>
</grandparent>
My requirement is to "flatten" this data so that it can be inserted into a temporary table and manipulated further down the procedure, so the above XML becomes:
Grandparent Name Parent Name Child Name
---------------- --------------- ---------------
grandpa bob papa john mark
grandpa bob papa john cindy
grandpa bob papa henry mary
This is currently being done using SQL Server XML Nodes:
SELECT
VIRT.node.value('../../../@name','varchar(15)') 'Grandparent Name',
VIRT.node.value('../../@name','varchar(15)') 'Parent Name',
VIRT.node.value('@name','varchar(15)') 'Child Name'
FROM
@xmlFamilyTree.nodes('/grandparent/parent/children/child') AS VIRT(node)
This works great until I start throwing large amounts of data at the procedure (i.e. 1000+ child
nodes), at which point this grinds to a halt and takes between 1 and 2 minutes to execute. I think this may be due to the fact that I am starting off at the lowest level (<child
) and then traversing back up the XML document for each occurance. Would splitting this single query into 3 chunks (one per node that I need data from) improve performance here? Given that none of these nodes have "keys" on them that I could use to join back up with, could anyone offer any pointers how I might be able to go about doing this?
The nodes() method is useful when you want to shred an xml data type instance into relational data. It allows you to identify nodes that will be mapped into a new row.
SQL Server provides an XML schema that defines syntax for writing XML format files to use for bulk importing data into a SQL Server table. XML format files must adhere to this schema, which is defined in the XML Schema Definition Language (XSDL).
I seem to have answered my own question after a bit more looking around online:
SELECT
grandparent.gname.value('@name', 'VARCHAR(15)'),
parent.pname.value('@name', 'VARCHAR(15)'),
child.cname.value('@name', 'VARCHAR(15)')
FROM
@xmlFamilyTree.nodes('/grandparent') AS grandparent(gname)
CROSS APPLY
grandparent.gname.nodes('*') AS parent(pname)
CROSS APPLY
parent.pname.nodes('children/*') AS child(cname)
Using CROSS APPLY
I can select the top-level grandparent
node and use this to select the child parent
nodes and so on. Using this method I have taken my query from executing in around 1 minute 30 seconds down to around 6 seconds.
Interestingly though, if I use the "old" OPEN XML
method to retrieve the same data, the query executes in 1 second!
It seems like you may have to approach the use of these two techniques on a case-by-case basis depending on the expected size/complexity of the document being passed in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With