Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XQuery Plan Complexity

I have a SQLCLR scalar function that will steam an XmlReader that I need to shred on demand into an inline resultset. These XML objects are generated on demand so I can't use an XML index. It will be common to have over 100 columns in the resulting datasets. Consider this sample code:

CREATE XML SCHEMA COLLECTION RAB AS '
<xsd:schema xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />

  <xsd:element name="r" type="r"/>

  <xsd:complexType name="r">
    <xsd:attribute name="a" type="sqltypes:int" use="required"/>
    <xsd:attribute name="b" type="sqltypes:int" use="required"/>
    <xsd:attribute name="c" type="sqltypes:int" use="required"/>
  </xsd:complexType>
</xsd:schema>';
GO

DECLARE @D TABLE(x XML(DOCUMENT RAB) NOT NULL);

INSERT INTO @D
VALUES
('<r a="3" b="4" c="34"/>'),
('<r a="5" b="6" c="56"/>'),
('<r a="7" b="8" c="78"/>')

SELECT x.value('/r/@a', 'int') a, x.value('/r/@b', 'int') b, x.value('/r/@c', 'int') c
FROM @d a

This fills a typed XML column in a table variable with some XML values and breaks the attributes into separate columns. The execution plan for this seems overly messy:

 |--Compute Scalar(DEFINE:([Expr1009]=[Expr1008], [Expr1016]=[Expr1015], [Expr1023]=[Expr1022]))
     |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
        |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
          |    |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[x]))
          |    |    |--Table Scan(OBJECT:(@d AS [a]))
          |    |    |--Stream Aggregate(DEFINE:([Expr1008]=MIN([Expr1024])))
          |    |         |--Compute Scalar(DEFINE:([Expr1024]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader wi
          |    |              |--Table-valued function
          |    |--Stream Aggregate(DEFINE:([Expr1015]=MIN([Expr1025])))
          |         |--Compute Scalar(DEFINE:([Expr1025]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader with XP
          |              |--Table-valued function
          |--Stream Aggregate(DEFINE:([Expr1022]=MIN([Expr1026])))
               |--Compute Scalar(DEFINE:([Expr1026]=CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],XML Reader with XPath filter.[lvalue],XML Reader with XPath f
                    |--Table-valued function

It's got a nested loop for each column! The query plan is going to be way too complex if I join multiple of these tables with 100 columns each. Also, I don't understand the purpose of those StreamAggregate operators. The the contents go like this:

MIN(
    CASE WHEN @d.[x] as [a].[x] IS NULL 
    THEN NULL ELSE 
    CASE WHEN datalength(CONVERT_IMPLICIT(sql_variant,
            CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7))
            ,0),0))>=(128)
    THEN CONVERT_IMPLICIT(int,
        CASE WHEN datalength(xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7)))<(128)
        THEN NULL
        ELSE xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7))
        END,0)
        ELSE CONVERT_IMPLICIT(int,
            CONVERT_IMPLICIT(sql_variant,CONVERT_IMPLICIT(nvarchar(64),xsd_cast_to_maybe_large(xrpf.[value],xrpf.[lvalue],xrpf.[lvaluebin],xrpf.[tid],(15),(7)),0),0),0)
    END
END)

Yuck! I thought using a typed XML group with a sqltype type was supposed to avoid the conversion?

Either I'm overestimating how efficient this is going to be or I'm doing something wrong. My question is how can I fix this so I don't have extra query plan operators added for each column and ideally avoid the conversions, or should I give up and find a non-xpath way of doing this?

References:

sqlTypes http://msdn.microsoft.com/en-us/library/ee320775%28v=sql.105%29.aspx

XML data type methods http://technet.microsoft.com/en-us/library/ms190798%28v=sql.105%29.aspx

enter image description here

like image 498
Chris Smith Avatar asked Oct 31 '14 22:10

Chris Smith


People also ask

What is XQuery used for?

What is XQuery For? XQuery was devised primarily as a query language for data stored in XML form. So its main role is to get information out of XML databases — this includes relational databases that store XML data, or that present an XML view of the data they hold.

What is XQuery and XPath?

XQuery is an active programming language which is used to interact with XML data groups. XPath is an XML method language which is applied for node selection in XML dataset using queries.

What is the correct syntax for comments in XQuery?

XQuery comments, delimited by (: and :) , can be added to any query to provide more information about the query itself. These comments are ignored during processing.


1 Answers

There are some mysteries in the query plan that needs to be sorted out first. What does the compute scalar do a and why is there a stream aggregate.

The table valued function returns a node table of the shredded XML, one row for each shredded row. When you use typed XML those columns are value, lvalue, lvaluebin and tid. Those columns are used in the compute scalar to calculate the actual value. The code in there looks a bit strange and I can't say that I understand why it is as it is but the gist of it is that the function xsd_cast_to_maybe_large returns the value and there is code that handles the case when the value is equal to and greater than 128 bytes.

CASE WHEN datalength(
                     CONVERT_IMPLICIT(sql_variant,
                        CONVERT_IMPLICIT(nvarchar(64),
                                         xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                 XML Reader with XPath filter.[lvalue],
                                                                 XML Reader with XPath filter.[lvaluebin],
                                                                 XML Reader with XPath filter.[tid],(15),(5),(0)),0),0))>=(128) 
  THEN CONVERT_IMPLICIT(int,CASE WHEN datalength(xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                         XML Reader with XPath filter.[lvalue],
                                                                         XML Reader with XPath filter.[lvaluebin],
                                                                         XML Reader with XPath filter.[tid],(15),(5),(0)))<(128) 
                              THEN NULL 
                              ELSE xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                           XML Reader with XPath filter.[lvalue],
                                                           XML Reader with XPath filter.[lvaluebin],
                                                           XML Reader with XPath filter.[tid],(15),(5),(0)) 
                            END,0) 
  ELSE CONVERT_IMPLICIT(int,CONVERT_IMPLICIT(sql_variant,
                                               CONVERT_IMPLICIT(nvarchar(64),
                                                                xsd_cast_to_maybe_large(XML Reader with XPath filter.[value],
                                                                                        XML Reader with XPath filter.[lvalue],
                                                                                        XML Reader with XPath filter.[lvaluebin],
                                                                                        XML Reader with XPath filter.[tid],(15),(5),(0)),0),0),0) 
END

The same compute scalar for non typed XML is much simpler and actually understandable.

CASE WHEN datalength(XML Reader with XPath filter.[value])>=(128) 
  THEN CONVERT_IMPLICIT(int,XML Reader with XPath filter.[lvalue],0) 
  ELSE CONVERT_IMPLICIT(int,XML Reader with XPath filter.[value],0) 
END

If there are more than 128 bytes in value fetch from lvalue else fetch from value. In the case with non typed XML the returned node table only outputs the columns id, value and lvalue.

When you use typed XML the storage of the node values are optimized based on the datatype specified in the schema. Looks like it could either end up in value, lvalue or lvaluebin in the node table depending on what type of value it is and xsd_cast_to_maybe_large is there to help sort things out.

The stream aggregate does a min() over the returned values from the compute scalar. We know and SQL Server does (at least sometimes) knows that there will only ever be one row returned from the table valued function when you specify an XPath in the value() function. The parser makes sure that we build the XPath correctly but when the query optimizer looks at the estimated rows it sees 200 rows. The base estimate for the table valued function that parses XML is 10000 rows and then there is some adjustments made using the XPath used. In this case it ends up with 200 rows where there is only one. Pure speculation on my part is that the stream aggregate is there to take care of this discrepancy. It will never aggregate anything, only send the one row through that is returned but it does affect the cardinality estimate for the entire branch and makes sure the optimizer uses 1 rows as an estimate for that branch. That is of course really important when the optimizer chooses join strategies etc.

So how about 100 attributes? Yes, there will be 100 branches if you use the value function 100 times. But there are some optimizations to be done here. I created a test rig to see what shape and form of the query would be the fastest using 100 attributes over 10 rows.

The winner was to use untyped XML and not to use the nodes() function to shred on r.

select X.value('(/r/@a1)[1]', 'int') as a1,
       X.value('(/r/@a2)[1]', 'int') as a2,
       X.value('(/r/@a3)[1]', 'int') as a3
from @T

There is also a way to avoid the 100 branches using pivot but depending on what your actual query looks like it might not be possible. The data type coming out from the pivot must be the same. You could of course extract them as a string and convert to appropriate type in the column list. It also requires that your table has a primary/unique key.

select a1, a2, a3
from (
     select T.ID, -- primary key of @T
            A.X.value('local-name(.)', 'nvarchar(50)') as Name,
            A.X.value('.', 'int') as Value
     from @T as T
       cross apply T.X.nodes('/r/@*') as A(X)
     ) as T
pivot(min(T.Value) for Name in (a1, a2, a3)) as P  

Query plan for pivot query, 10 rows 100 attributes:

enter image description here

Below is the results and the test rig I used. I tested with 100 attributes and 10 rows and all int attributes.

Result:

Test                                                Duration (ms)
--------------------------------------------------  -------------
untyped XML value('/r[1]/@a')                       195     
untyped XML value('(/r/@a)[1]')                     108
untyped XML value('@a') cross apply nodes('/r')     131  
untyped XML value('@a') cross apply nodes('/r[1]')  127
typed XML value('/r/@a')                            185
typed XML value('(/r/@a)[1]')                       148
typed XML value('@a') cross apply nodes('/r')       176
untyped XML pivot                                    34
typed XML pivot                                      52

Code:

drop type dbo.TRABType
drop type dbo.TType;
drop xml schema collection dbo.RAB;

go

declare @NumAtt int = 100;
declare @Attribs nvarchar(max);

with xmlnamespaces('http://www.w3.org/2001/XMLSchema' as xsd)
select @Attribs = (
select top(@NumAtt) 'a'+cast(row_number() over(order by 1/0) as varchar(11)) as '@name',

                    'sqltypes:int' as '@type',
                    'required' as '@use'
from sys.columns
for xml path('xsd:attribute')
)
--CREATE XML SCHEMA COLLECTION RAB AS 

declare @Schema nvarchar(max) = 
'
<xsd:schema xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
  <xsd:element name="r" type="r"/>
  <xsd:complexType name="r">[ATTRIBS]</xsd:complexType>
</xsd:schema>';

set @Schema = replace(@Schema, '[ATTRIBS]', @Attribs)

create xml schema collection RAB as @Schema

go

create type dbo.TType as table
(
  ID int identity primary key,
  X xml not null
);

go

create type dbo.TRABType as table
(
  ID int identity primary key,
  X xml(document rab) not null
);


go

declare @NumAtt int = 100;
declare @NumRows int = 10;

declare @X nvarchar(max);
declare @C nvarchar(max);
declare @M nvarchar(max);

declare @S1 nvarchar(max);
declare @S2 nvarchar(max);
declare @S3 nvarchar(max);
declare @S4 nvarchar(max);
declare @S5 nvarchar(max);
declare @S6 nvarchar(max);
declare @S7 nvarchar(max);
declare @S8 nvarchar(max);
declare @S9 nvarchar(max);

set @X = N'<r '+
  (
  select top(@NumAtt) 'a'+cast(row_number() over(order by 1/0) as varchar(11))+'="'+cast(row_number() over(order by 1/0) as varchar(11))+'" '
  from sys.columns
  for xml path('')
  )+
'/>';

set @C = 
  stuff((
  select top(@NumAtt) ',a'+cast(row_number() over(order by 1/0) as varchar(11))
  from sys.columns
  for xml path('') 
  ), 1, 1, '')

set @M = 
  stuff((
  select top(@NumAtt) ',MAX(CASE WHEN name = ''a'+cast(row_number() over(order by 1/0) as varchar(11))+''' THEN val END)'
  from sys.columns
  for xml path('') 
  ), 1, 1, '')


declare @T dbo.TType;
insert into @T(X)
select top(@NumRows) @X
from sys.columns;

declare @TRAB dbo.TRABType;
insert into @TRAB(X)
select top(@NumRows) @X
from sys.columns;


-- value('/r[1]/@a')
set @S1 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''/r[1]/@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
option (maxdop 1)';

-- value('(/r/@a)[1]')
set @S2 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''(/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+')[1]'', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
option (maxdop 1)';

-- value('@a') cross apply nodes('/r')
set @S3 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
  cross apply T.X.nodes(''/r'') as T2(X)
option (maxdop 1)';


-- value('@a') cross apply nodes('/r[1]')
set @S4 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @T as T
  cross apply T.X.nodes(''/r[1]'') as T2(X)
option (maxdop 1)';

-- value('/r/@a') typed XML
set @S5 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
option (maxdop 1)';

-- value('(/r/@a)[1]')
set @S6 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T.X.value(''(/r/@a'+cast(row_number() over(order by 1/0) as varchar(11))+')[1]'', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
option (maxdop 1)';

-- value('@a') cross apply nodes('/r') typed XML
set @S7 = N'
select T.ID'+
 (
 select top(@NumAtt) ', T2.X.value(''@a'+cast(row_number() over(order by 1/0) as varchar(11))+''', ''int'')'
 from sys.columns
 for xml path('')
 )+
' from @TRAB as T
  cross apply T.X.nodes(''/r'') as T2(X)
option (maxdop 1)';

-- pivot
set @S8 = N'
select ID, '+@C+' 
from (
     select T.ID,
            A.X.value(''local-name(.)'', ''nvarchar(50)'') as Name,
            A.X.value(''.'', ''int'') as Value
     from @T as T
       cross apply T.X.nodes(''/r/@*'') as A(X)
     ) as T
pivot(min(T.Value) for Name in ('+@C+')) as P
option (maxdop 1)';  

-- typed pivot
set @S9 = N'
select ID, '+@C+' 
from (
     select T.ID,
            A.X.value(''local-name(.)'', ''nvarchar(50)'') as Name,
            cast(cast(A.X.query(''string(.)'') as varchar(11)) as int) as Value
     from @TRAB as T
       cross apply T.X.nodes(''/r/@*'') as A(X)
     ) as T
pivot(min(T.Value) for Name in ('+@C+')) as P
option (maxdop 1)';  


exec sp_executesql @S1, N'@T dbo.TType readonly', @T;
exec sp_executesql @S2, N'@T dbo.TType readonly', @T;
exec sp_executesql @S3, N'@T dbo.TType readonly', @T;
exec sp_executesql @S4, N'@T dbo.TType readonly', @T;
exec sp_executesql @S5, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S6, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S7, N'@TRAB dbo.TRABType readonly', @TRAB;
exec sp_executesql @S8, N'@T dbo.TType readonly', @T;
exec sp_executesql @S9, N'@TRAB dbo.TRABType readonly', @TRAB;
like image 69
Mikael Eriksson Avatar answered Oct 16 '22 23:10

Mikael Eriksson