Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SQL to transpose/flatten XML structure to columns

I am using SQL Server (2008/2012) and I know there are similar answers from lots of searching, however I can't seem to find the appropriate example/pointers for my case.

I have an XML column in a SQL Server table holding this data:

<Items>
 <Item>
  <FormItem>
    <Text>FirstName</Text>
    <Value>My First Name</Value>
  </FormItem>
  <FormItem>
    <Text>LastName</Text>
    <Value>My Last Name</Value>
  </FormItem>
  <FormItem>
    <Text>Age</Text>
    <Value>39</Value>
  </FormItem>
 </Item>
 <Item>
  <FormItem>
    <Text>FirstName</Text>
    <Value>My First Name 2</Value>
  </FormItem>
  <FormItem>
    <Text>LastName</Text>
    <Value>My Last Name 2</Value>
  </FormItem>
  <FormItem>
    <Text>Age</Text>
    <Value>40</Value>
  </FormItem>
 </Item>
</Items>

So even though the structure of <FormItem> is going to be the same, I can have multiple (most commonly no more than 20-30) sets of form items..

I am essentially trying to return a query from SQL in the format below, i.e. dynamic columns based on /FormItem/Text:

FirstName         LastName         Age    ---> More columns as new `<FormItem>` are returned
My First Name     My Last Name     39          Whatever value etc..
My First Name 2   My Last Name 2   40          

So, at the moment I had the following:

select 
    Tab.Col.value('Text[1]','nvarchar(100)') as Question,
    Tab.Col.value('Value[1]','nvarchar(100)') as Answer
from
    @Questions.nodes('/Items/Item/FormItem') Tab(Col)

Of course that hasn't transposed my XML rows into columns, and obviously is fixed with fields anyway.. I have been trying various "Dynamic SQL" approaches where the SQL performs a distinct selection of (in my case) the <Text> node, and then uses some sort of Pivot? but I couldn't seem to find the magic combination to return the results I need as a dynamic set of columns for each row (<Item> within the collection of <Items>).

I'm sure it can be done having seen so many very similar examples, however again the solution eludes me!

Any help gratefully received!!

like image 386
Dav.id Avatar asked Mar 01 '13 16:03

Dav.id


3 Answers

Parsing the XML is fairly expensive so instead of parsing once to build a dynamic query and once to get the data you can create a temporary table with a Name-Value list and then use that as the source for a dynamic pivot query.
dense_rank is there to create the ID to pivot around.
To build the column list in the dynamic query it uses the for xml path('') trick.

This solution requires that your table has a primary key (ID). If you have the XML in a variable it can be somewhat simplified.

select dense_rank() over(order by ID, I.N) as ID,
       F.N.value('(Text/text())[1]', 'varchar(max)') as Name,
       F.N.value('(Value/text())[1]', 'varchar(max)') as Value
into #T
from YourTable as T
  cross apply T.XMLCol.nodes('/Items/Item') as I(N)
  cross apply I.N.nodes('FormItem') as F(N)

declare @SQL nvarchar(max)
declare @Col nvarchar(max)

select @Col = 
  (
  select distinct ','+quotename(Name)
  from #T
  for xml path(''), type
  ).value('substring(text()[1], 2)', 'nvarchar(max)')

set @SQL = 'select '+@Col+'
            from #T
            pivot (max(Value) for Name in ('+@Col+')) as P'

exec (@SQL)

drop table #T

SQL Fiddle

like image 200
Mikael Eriksson Avatar answered Oct 22 '22 04:10

Mikael Eriksson


select Tab.Col.value('(FormItem[Text = "FirstName"]/Value)[1]', 'varchar(32)') as FirstName, 
        Tab.Col.value('(FormItem[Text = "LastName"]/Value)[1]', 'varchar(32)') as LastName, 
        Tab.Col.value('(FormItem[Text = "Age"]/Value)[1]', 'int') as Age
from @Questions.nodes('/Items/Item') Tab(Col)
like image 3
muhmud Avatar answered Oct 22 '22 04:10

muhmud


I wanted to add my "own answer" really just for completeness to possibly help others.. however it is most definitely based on the great help from @Mikael above!! so again, this is really for completeness only - all kudos to @Mikael.

Basically I ended up with the following proc. I needed to select some data/filter, and get some joined data too and allow some boolean filtering on some of the input params. Then drop into the next section which was create a temp table of my relational data and the required xml nodes via the cross apply. The final step was to then pivot the results/dynamically create the columns from the selected XML node..

CREATE PROCEDURE [dbo].[usp_RPT_ExtractFlattenentries]
    @CompanyID          int,
    @MainSelector       nvarchar(50) = null,
    @SecondarySelector      nvarchar(255) = null,
    @DateFrom           datetime = '01-jan-2012',
    @DateTo             datetime = '31-dec-2100',
    @SysReference       nvarchar(20) = null
AS
BEGIN
    SET NOCOUNT ON;

    --  Create the table var to hold the XML form data from the entries
    declare @FeedbackXml table (
        ID int identity primary key,
        XMLCol xml,
        CompanyName nvarchar(20),
        SysReference nvarchar(20),
        RecordDate datetime,
        EntryName  nvarchar(255),
        MainSelector nvarchar(50)
    )

    --  STEP 1: Get the raw submission data based on the params passed in
    --  *Note: The double casting is necessary as the "form" field is nvarchar (not varchar) and we need xml in UTF-8 format
    begin
        insert into @FeedbackXml
            (XMLCol, CompanyName, SysReference, RecordDate, EntryName, MainSelector)
        select cast(cast(e.form as nvarchar(max)) as xml), c.name, e.SysReference, e.RecordDate, e.name, e.wizard
        from 
            entries s
        left join
            companies o on e.companies = c.ID
        where 
            (@CompanyID = -1 or @CompanyID = e.companies)
        and
            (@MainSelector is null or @MainSelector = e.wizard)
        and
            (@SecondarySelector is null or @SecondarySelector = e.name)
        and
            (@SysReference is null or @SysReference = e.SysReference)
        and
            (e.RecordDate >= @DateFrom and e.RecordDate <= @DateTo)
    end

    --  STEP 2: Flatten the required XML structure to provide a base for the pivot, and include other fields we wish to output
    select dense_rank() over(order by ID) as ID,
            T.RecordDate, T.CompanyName, T.SysReference, T.EntryName, T.MainSelector,
            F.N.value('(FieldNameNode/text())[1]', 'nvarchar(max)') as FieldName,
            F.N.value('(FieldNameValue/text())[1]', 'nvarchar(max)') as FieldValue
    into #TempData
    from @FeedbackXml as T
        cross apply T.XMLCol.nodes('/root/companies/') as I(N) -- Xpath to the desired node start point
        cross apply I.N.nodes('company') as F(N) -- The actual node collection that forms the "field name" and "field value" data

    --  STEP 3: Pivot the #TempData table creating a dynamic column structure based on the selected XML nodes in step 2
    declare @SQL nvarchar(max)
    declare @Col nvarchar(max)

    select @Col = 
      (
      select distinct ','+quotename(FieldName)
      from #TempData
      for xml path(''), type
      ).value('substring(text()[1], 2)', 'nvarchar(max)')

    set @SQL = 'select CompanyName, SysReference, EntryName, MainSelector, RecordDate, '+@Col+'
                from #TempData
                pivot (max(FieldValue) for FieldName in ('+@Col+')) as P'

    exec (@SQL)
    drop table #TempData

END

Again, really only added this answer to provide a complete picture from my perspective, and may help others.

like image 3
Dav.id Avatar answered Oct 22 '22 03:10

Dav.id