Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing nested xml into denormalized table

How to parse nested xml in sql server into single table. Considering RowGuid is unique for each Customer

eg

I want to parse this xml in a single table which would be denormalized and will contain one to many relationships. Considering every nesting has business primary key.

<Customers>
    <Customer>
         <Type xsi:nil="true" />
          <RowGuid>FEFF32BC-1DAB-4F8A-80F0-CFE293C0BEC4</RowGuid>
          <AccountId>0</AccountId>
          <AccountNumber>bdb8eb51-d</AccountNumber>
          <AccountTransactions>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
         </AccountTransactions>
        <Addresses>
             <Address>
                    <city>DELHI</city>
             </Address>
             <Address>
                    <city>MUMBAI</city>
             </Address>
         </Addresses>
      </Customer>
    <Customer>
         <Type xsi:nil="true" />
          <RowGuid>C3D4772E-1DAB-4F8A-80F0-CFE293C0BEC4</RowGuid>
          <AccountId>0</AccountId>
          <AccountNumber>bdb8eb51-d</AccountNumber>
          <AccountTransactions>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
         </AccountTransactions>
      </Customer>

like image 369
Shantanu Gupta Avatar asked Sep 14 '12 05:09

Shantanu Gupta


1 Answers

If table doesn't need to be normalized you can do LEFT JOIN. I also added a namespace to the Customers element, it is needed because of xsi:nil="true". Try it:

DECLARE @xml XML =
'<Customers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Customer>
         <Type xsi:nil="true" />
          <RowGuid>FEFF32BC-1DAB-4F8A-80F0-CFE293C0BEC4</RowGuid>
          <AccountId>0</AccountId>
          <AccountNumber>bdb8eb51-d</AccountNumber>
          <AccountTransactions>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
         </AccountTransactions>
        <Addresses>
             <Address>
                    <city>DELHI</city>
             </Address>
             <Address>
                    <city>MUMBAI</city>
             </Address>
         </Addresses>
      </Customer>
    <Customer>
         <Type xsi:nil="true" />
          <RowGuid>C3D4772E-1DAB-4F8A-80F0-CFE293C0BEC4</RowGuid>
          <AccountId>0</AccountId>
          <AccountNumber>bdb8eb51-d</AccountNumber>
          <AccountTransactions>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
            <AccountTransaction>
                <PaymentDate>2012-09-13 22:19:58</PaymentDate>
                <Balance>500</Balance>
            </AccountTransaction>
         </AccountTransactions>
      </Customer>
</Customers>'

SELECT  a.[Type],
        a.RowGuid,
        a.AccountId,
        a.AccountNumber,
        b.PaymentDate,
        b.Balance,
        c.[Address]
FROM    
(
    SELECT  
            Customer.value('Type[1]', 'VARCHAR(500)') [Type],
            Customer.value('RowGuid[1]', 'UNIQUEIDENTIFIER') RowGuid,
            Customer.value('AccountId[1]', 'INT') AccountId,
            Customer.value('AccountNumber[1]', 'VARCHAR(500)') AccountNumber
    FROM    @xml.nodes('/Customers/Customer') tbl(Customer)
) a
LEFT JOIN
(
    SELECT  
            AccountTransaction.value('PaymentDate[1]', 'DATETIME') PaymentDate,
            AccountTransaction.value('Balance[1]', 'DECIMAL(20, 2)') Balance,
            AccountTransaction.value('../../RowGuid[1]', 'UNIQUEIDENTIFIER') RowGuid
    FROM    @xml.nodes('/Customers/Customer/AccountTransactions/AccountTransaction') tbl(AccountTransaction)
)   b ON
    a.RowGuid = b.RowGuid
LEFT JOIN
(
    SELECT  
            Address.value('city[1]', 'VARCHAR(500)') [Address],
            Address.value('../../RowGuid[1]', 'UNIQUEIDENTIFIER') RowGuid
    FROM    @xml.nodes('/Customers/Customer/Addresses/Address') tbl(Address)        
)   c ON
    a.RowGuid = c.RowGuid

UPDATE:

Due to the large query cost of the first version of this query (the one that uses XML data type methods) I created another version that uses OPENXML instead of nodes and value methods. There's a great difference in cost in favor of OPENXML approach:

DECLARE @handle INT

CREATE TABLE #Customer (Type VARCHAR(500),
    RowGuid UNIQUEIDENTIFIER,
    AccountId INT,
    AccountNumber VARCHAR(500)
)

CREATE TABLE #AccountTransaction (
    PaymentDate DATETIME,
    Balance DECIMAL(20, 2),
    RowGuid UNIQUEIDENTIFIER
)

CREATE TABLE #Address (
    City VARCHAR(500),
    RowGuid UNIQUEIDENTIFIER
)

EXEC sp_xml_preparedocument @handle OUTPUT, @xml

INSERT  #Customer
SELECT  *
FROM    OPENXML(@handle, '/Customers/Customer', 2)
WITH    (
        Type VARCHAR(500),
        RowGuid UNIQUEIDENTIFIER,
        AccountId INT,
        AccountNumber VARCHAR(500)
)

INSERT  #AccountTransaction
SELECT  *
FROM    OPENXML(@handle, '/Customers/Customer/AccountTransactions/AccountTransaction', 2)
WITH    (
        PaymentDate DATETIME,
        Balance DECIMAL(20, 2),
        RowGuid UNIQUEIDENTIFIER '../../RowGuid[1]'
)

INSERT  #Address
SELECT  *
FROM    OPENXML(@handle, '/Customers/Customer/Addresses/Address', 2)
WITH    (
        city VARCHAR(500),
        RowGuid UNIQUEIDENTIFIER '../../RowGuid[1]'
)

SELECT  a.*,
        b.PaymentDate,
        b.Balance,
        c.City
FROM    #Customer a
LEFT    JOIN #AccountTransaction b ON
        b.RowGuid = a.RowGuid
LEFT    JOIN #Address c ON
        c.RowGuid = a.RowGuid

EXEC sp_xml_removedocument @handle
like image 82
Ivan Golović Avatar answered Oct 17 '22 02:10

Ivan Golović