Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is CROSS APPLY needed when using XPath queries?

tl;dr

Why doesn't:

SELECT      SomeXmlColumn.nodes('/people/person') AS foo(b) FROM MyTable 

work?

The Before Question

Nearly ever answer I've seen (or gotten) for using XPath queries in SQL Server requires that you join the XML document back to itself using a CROSS APPLY.

Why?

For example:

SELECT     p.value('(./firstName)[1]', 'VARCHAR(8000)') AS firstName,    p.value('(./lastName)[1]', 'VARCHAR(8000)') AS lastName FROM table     CROSS APPLY field.nodes('/person') t(p) 

For example:

SELECT a.BatchXml.value('(Name)[1]', 'varchar(50)') AS Name,     a.BatchXml.value('(IDInfo/IDType)[1]', 'varchar(50)') AS IDType,     a.BatchXml.value('(IDInfo/IDOtherDescription)[1]', 'varchar(50)') AS IDOtherDescription FROM BatchReports b CROSS APPLY b.BatchFileXml.nodes('Customer') A(BatchXml) WHERE a.BatchXml.exist('IDInfo/IDType[text()=3]')=1 

For example:

SELECT  b.BatchID,         x.XmlCol.value('(ReportHeader/OrganizationReportReferenceIdentifier)[1]','VARCHAR(100)') AS OrganizationReportReferenceIdentifier,         x.XmlCol.value('(ReportHeader/OrganizationNumber)[1]','VARCHAR(100)') AS OrganizationNumber FROM    Batches b CROSS APPLY b.RawXml.nodes('/CasinoDisbursementReportXmlFile/CasinoDisbursementReport') x(XmlCol); 

And even from MSDN Books Online:

SELECT nref.value('first-name[1]', 'nvarchar(32)') FirstName,        nref.value('last-name[1]', 'nvarchar(32)') LastName FROM    [XmlFile] CROSS APPLY [Contents].nodes('//author') AS p(nref) 

They all use it. But nobody (not even the SQL Server Books Online) explains why it's needed, what problem it solves, what it's doing, or how it works.

Even the simplest case needs them

Even the simplest example of taking the XML:

<people>    <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>    <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>    <person><firstName>Bob</firstName><lastName>Burns</lastName></person> </people> 

and returning the values:

FirstName  LastName =========  ======== Jon        Johnson Kathy      Carter Bob        Burns 

needs a join:

SELECT     p.value('(./firstName)[1]', 'VARCHAR(8000)') AS firstName,    p.value('(./lastName)[1]', 'VARCHAR(8000)') AS lastName FROM table     CROSS APPLY field.nodes('/person') t(p) 

What's confusing is that it doesn't even use the table it joins from, why does it need it?

Since querying for XML has never been documented or explained, hopefully we can solve that now.

What does it actually do?

So let's start with an actual example, since we want an actual answer, that gives an actual explanation:

DECLARE @xml xml; SET @xml =  '<people>    <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>    <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>    <person><firstName>Bob</firstName><lastName>Burns</lastName></person> </people>'; ;WITH MyTable AS (     SELECT @xml AS SomeXmlColumn ) 

Now we have psuedo table we can query from:

enter image description here

Let's start with the obvious

First I need the people. In real XML, I can easily return the three rows:

/people/person 

Which gives a NodeList containing three nodes:

<person><firstName>Jon</firstName><lastName>Johnson</lastName></person> <person><firstName>Kathy</firstName><lastName>Carter</lastName></person> <person><firstName>Bob</firstName><lastName>Burns</lastName></person> 

In SQL Server, the same query:

SELECT     SomeXmlColumn.query('/people/person') FROM MyTable 

doesn't return three rows, but rather one row with the XML containing the three nodes:

<person>   <firstName>Jon</firstName>   <lastName>Johnson</lastName> </person> <person>   <firstName>Kathy</firstName>   <lastName>Carter</lastName> </person> <person>   <firstName>Bob</firstName>   <lastName>Burns</lastName> </person> 

Obviously this is unsuitable, when my end goal is to return 3 rows. I somehow have to break up the three rows into three rows.

Onto the names

My actual goal is to get the firstName and lastName. In XPath I could do something like:

/people/person/firstName|/people/person/lastName 

which gets me the six nodes, although they are not adjoining

<firstName>Jon</firstName> <lastName>Johnson</lastName> <firstName>Kathy</firstName> <lastName>Carter</lastName> <firstName>Bob</firstName> <lastName>Burns</lastName> 

In SQL Server, we try something similar

SELECT      SomeXmlColumn.query('/people/person/firstName') AS FirstName,     SomeXmlColumn.query('/people/person/lastName') AS LastName FROM MyTable 

which gets us one row, with each column containing an XML fragment:

FirstName                     LastName ============================  ============================ <firstName>Jon</firstName>    <lastName>Johnson</lastName> <firstName>Kathy</firstName>  <lastName>Carter</lastName> <firstName>Bob</firstName>    <lastName>Burns</lastName> 

...and now I'm tired. I've spent three hours writing this question, on top of the four hours I spent asking yesterday's question. I'll come back to this question later; when it's cooler in here, and I have more energy to beg for help.

Second wind

The fundamental problem is that no matter what I do, I keep getting only one row returned. I want three rows returned (because there are three people). SQL Server does have a function that can convert XML rows (called nodes) into SQL Server rows (called rows). It's the .nodes function:

The nodes() method is useful when you want to shred an xml data type instance into relational data. It allows you to identify nodes that will be mapped into a new row.

This means that you "call" the .nodes method with an XPath query on an xml data type. And what used to come back in SQL Server as one row with three nodes, comes back (correctly) as three nodes:

.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn) 

Conceptually this returns:

SomeOtherXmlColumn ------------------------------------------------------------------------ <person><firstName>Jon</firstName><lastName>Johnson</lastName></person> <person><firstName>Kathy</firstName><lastName>Carter</lastName></person> <person><firstName>Bob</firstName><lastName>Burns</lastName></person> 

But if you actually try to use it, it doesn't work:

DECLARE @xml xml; SET @xml =  '<people>    <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>    <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>    <person><firstName>Bob</firstName><lastName>Burns</lastName></person> </people>'; SELECT * FROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn) 

Gives the error:

Msg 493, Level 16, State 1, Line 8
The column 'SomeOtherXmlColumn' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.

I presume this is because I'm not allowed to look at the results set (i.e. the * is not allowed). No problem. I'll use the same .query I used originally:

SELECT SomeOtherXmlColumn.query('/') AS SomeOtherOtherXmlColumn FROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn) 

Which returns rows. But rather than splitting a list of nodes into rows, it just duplicates the entire XML:

SomeOtherOtherXmlColumn ---------------------------------------- <people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people> <people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people> <people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people> 

Which makes sense. I was expecting an XPath query in SQL Server to behave like XPath. But a hindsight careful reading of the docs say otherwise:

The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In these logical copies, the context node of every row instance is set to one of the nodes identified with the query expression, so that subsequent queries can navigate relative to these context nodes.

Now do it with an xml column

The preceding example was for a variable of type xml. Now we have to retrofit the .nodes function to work with a table containing an xml column:

SELECT     SomeXmlColumn.nodes('/people/person') FROM MyTable 

No, that doesn't work:

Msg 227, Level 15, State 1, Line 8
"nodes" is not a valid function, property, or field.

Although .nodes is a valid method of an xml data type, it simply doesn't work when you try to use it on an xml data type. Nor does it work on when used on an xml data type:

SELECT * FROM MyTable.SomeXmlColumn.nodes('/people/person') 

Msg 208, Level 16, State 1, Line 8
Invalid object name 'MyTable.SomeXmlColumn.nodes'.

Which I presume is why the CROSS APPLY modifier is needed. Not because you are joining anything, but because the SQL Server parser will refuse to recognize .nodes unless it's preceded with the keywords cross apply:

SELECT      'test' AS SomeTestColumn FROM MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn) 

And we start to get somewhere:

SomeTestColumn -------------- test test test 

And so if we then want to see the XML that comes back:

SELECT      SomeOtherXmlColumn.query('/') FROM (MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn)) 

Now we have three rows.

It seems that cross apply isn't used to a join, but merely a keyword that allows .nodes to work

And it seems that the SQL Server optimizer just refuses to accept any use of

.nodes 

and you must actually use:

CROSS APPLY .nodes 

And that's just how it is. And if that's the case - that's fine. That's the rule. And that led to years of confusion; thinking I was joining something to something else with the cross apply operator.

Except I believe there is more to it than that. There must, somehow, actually be a cross apply happening. But I cannot see where - or why.

like image 320
Ian Boyd Avatar asked May 06 '14 15:05

Ian Boyd


People also ask

What is the purpose of cross apply in SQL Server?

The CROSS APPLY operator is semantically similar to INNER JOIN. It retrieves all the records from the table where there are corresponding matching rows in the output returned by the table valued function.

What is cross apply in XML?

Cross Apply enables to execute SQL XML query on all rows of the database table instead of running SQL Select query on single XML data.

How does cross apply work?

CROSS APPLY returns only rows from the outer table that produce a result set from the table-valued function. It other words, result of CROSS APPLY doesn't contain any row of left side table expression for which no result is obtained from right side table expression. CROSS APPLY work as a row by row INNER JOIN.

What is cross apply and outer apply?

So you might conclude, the CROSS APPLY is equivalent to an INNER JOIN (or to be more precise its like a CROSS JOIN with a correlated sub-query) with an implicit join condition of 1=1 whereas the OUTER APPLY is equivalent to a LEFT OUTER JOIN.


1 Answers

Query:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)') FROM MyTable.SomeXmlColumn.nodes('./people/person/firstName') AS x(i); 

doesn't work, for the same reason why this query doesn't work:

SELECT * FROM Person.Person.FirstName; 

but this does:

SELECT FirstName FROM Person.Person; 

-

FROM clause expects rowset, so this is valid, since nodes() returns rowset:

DECLARE @xml AS XML =  '<people>    <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>    <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>    <person><firstName>Bob</firstName><lastName>Burns</lastName></person> </people>';  SELECT x.i.value('(./text())[1]', 'VARCHAR(10)') FROM @xml.nodes('./people/person/firstName') AS x(i); 

If xml is not a variable but value in table, we first need to extract rows from this value, and this is when CROSS APPLY comes in handy:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)') FROM MyTable as t CROSS APPLY     t.SomeXmlColumn.nodes('./people/person/firstName') AS x(i); 

CROSS APPLY operator applies the right expression to each record from the left table (MyTable).

  • In MyTable table there is one record containing xml.
  • CROSS APPLY fetches this record and exposes it to expression in the right.
  • Right expression extracts records using nodes() function.
  • As a result there are 1 x 3 = 3 records (xml nodes) which are then processed by SELECT clause.

Compare to 'normal' CROSS APPLY query:

SELECT c.CustomerID, soh.TotalDue, soh.OrderDate FROM Sales.Customer AS c CROSS APPLY     (SELECT TOP(2) TotalDue, OrderDate     FROM Sales.SalesOrderHeader     WHERE CustomerID = c.CustomerID ORDER BY TotalDue DESC) AS soh; 

c.CustomerID is our t.SomeXmlColumn

like image 105
anth Avatar answered Sep 22 '22 04:09

anth