Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

shredding xml recursively into the database

I have the following XML data and the Element table.

DECLARE @input XML = '<root>
     <C1>
       <C2>
         <C3>           <C4>data1</C4>       </C3>         
       </C2>
       <C2>
         <C3>data2</C3>
       </C2>
     </C1>
     <D1>
        <D2>data3</D2>
        <D2>data4</D2>
     </D1>
    </root>'

Element table:( this is just an example so can be changed to match an appropriate solution.)

CREATE TABLE Element (  elementId INT IDENTITY PRIMARY KEY, 
elementName VARCHAR (200) NOT NULL, 
parentId INT,   
data VARCHAR(300) );

According to @input the root element is parent of C1 and D1, then C1 is C2 parent, ...

What is the solution for SQL server 2012/2014 to code a stored procedure with CTE (or any other type of SQL object) to recursively put all element names into the Element table?

data column fills with data in this case, the C4 and the second C3, and D2 elements have data the rest of element are null.

I also saw Hierarchical Data type and I wonder if that could be helpful to solve this problem?

like image 430
Fred Jand Avatar asked Jul 19 '14 14:07

Fred Jand


People also ask

What is XML shredding?

XML shredding is the process of extracting values from XML documents and using them to update tables in the database. This is done based on a user-defined mapping specification that maps the XML structure to the target tables.


1 Answers

With OpenXML you can get a table representation of your XML with ID and ParentID columns using the metaproperties.

Using the XML query in a merge will allow you to create a mapping table between the elementId identity column and the DOM node id from the XML.

The last step is to use the mapping table to update parentId in Element.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

CREATE TABLE Element (  elementId INT IDENTITY PRIMARY KEY, 
elementName VARCHAR (200) NOT NULL, 
parentId INT,   
data VARCHAR(300) );

Query 1:

declare @input xml = '
<root>
  <C1>
    <C2>
      <C3>
        <C4>data1</C4>
      </C3>
    </C2>
    <C2>
      <C3>data2</C3>
    </C2>
  </C1>
  <D1>
    <D2>data3</D2>
    <D2>data4</D2>
  </D1>
</root>';

-- OpenXML handle
declare @D int;

-- Table that capture output of merge with mapping between 
-- DOM node id and the identity column elementID in Element 
declare @T table
(
  ID int,
  ParentID int,
  ElementID int
);

-- Parse XML and get a handle
exec sp_xml_preparedocument @D output, @input;

-- Add rows to Element and fill the mapping table @T
merge into dbo.Element as E
using ( 
      select *
      from openxml(@D, '//*') with 
        (
          ID int '@mp:id',
          ParentID int '@mp:parentid',
          Data varchar(300) 'text()',
          ElementName varchar(200) '@mp:localname'
        )
      ) as S
on 0 = 1
when not matched by target then
  insert (elementName, data) values (S.ElementName, S.data)
output S.ID, S.ParentID, inserted.elementID into @T;

-- Update parentId in Elemet
update E
set parentId =  T2.ElementID
from dbo.Element as E
  inner join @T as T1
    on E.elementId = T1.ElementID
  inner join @T as T2
    on T1.ParentID = T2.ID


-- Relase the XML document
exec sp_xml_removedocument @D;

select *
from Element;

Results:

| ELEMENTID | ELEMENTNAME | PARENTID |   DATA |
|-----------|-------------|----------|--------|
|         1 |        root |   (null) | (null) |
|         2 |          C1 |        1 | (null) |
|         3 |          C2 |        2 | (null) |
|         4 |          C3 |        3 | (null) |
|         5 |          C4 |        4 |  data1 |
|         6 |          C2 |        2 | (null) |
|         7 |          C3 |        6 |  data2 |
|         8 |          D1 |        1 | (null) |
|         9 |          D2 |        8 |  data3 |
|        10 |          D2 |        8 |  data4 |
like image 162
Mikael Eriksson Avatar answered Oct 13 '22 10:10

Mikael Eriksson