Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I improve the performance of this JSON conversion in SQL Server 2016?

I have a table that resembles the following (the following code creates a table called #Temp. This has 160,000 rows which is roughly the same number of rows I am working with in my real dataset but there are more columns in the real dataset):

/* Create dummy employees*/

;WITH employeeNumbers
AS ( SELECT 1 AS employeeId
     UNION ALL
     SELECT employeeNumbers.employeeId + 1
     FROM   employeeNumbers
     WHERE  employeeNumbers.employeeId < 16000 )
SELECT *
INTO   #employeeId
FROM   employeeNumbers
OPTION ( MAXRECURSION 16000 )


/*Create saleItems*/
CREATE TABLE #SalesItems
    (
        category VARCHAR(100)
      , subCategory VARCHAR(100)
      , productName VARCHAR(1000)
    )
INSERT INTO #SalesItems ( category
                        , subCategory
                        , productName )
VALUES ( 'Furniture', 'Bookcases', 'Bush Somerset Collection Bookcase' )
     , ( 'Furniture', 'Chairs', 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back' )
     , ( 'Office Supplies', 'Labels', 'Self-Adhesive Address Labels for Typewriters by Universal' )
     , ( 'Furniture', 'Tables', 'Bretford CR4500 Series Slim Rectangular Table' )
     , ( 'Office Supplies', 'Storage', 'Eldon Fold n Roll Cart System' )
     , ( 'Furniture', 'Furnishings', 'Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood' )
     , ( 'Office Supplies', 'Art', 'Newell 322' )
     , ( 'Technology', 'Phones', 'Mitel 5320 IP Phone VoIP phone' )
     , ( 'Office Supplies', 'Binders', 'DXL Angle-View Binders with Locking Rings by Samsill' )
     , ( 'Technology', 'Phones', 'Samsung Galaxy S8' )

-- Create some random sales figures between 10 and 100
SELECT employeeId
     , category
     , subCategory
     , productName
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Jul 2017]
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Aug 2017]
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Sep 2017]
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Oct 2017]
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Nov 2017]
     , CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Dec 2017]
INTO   #Temp
FROM   #employeeId
JOIN   #SalesItems ON 1 = 1

CREATE INDEX empId
    ON #Temp ( employeeId )

SELECT *
FROM   #Temp

What I am doing is converting these results to a single json string for every employee id in the table. My query is as follows:

SELECT DISTINCT x.employeeId
              , (   SELECT y.category
                         , y.subCategory
                         , y.productName
                         , [Jul 2017] AS 'salesAmounts.Jul 2017'
                         , [Aug 2017] AS 'salesAmounts.Aug 2017'
                         , [Sep 2017] AS 'salesAmounts.Sep 2017'
                         , [Oct 2017] AS 'salesAmounts.Oct 2017'
                         , [Nov 2017] AS 'salesAmounts.Nov 2017'
                         , [Dec 2017] AS 'salesAmounts.Dec 2017'
                    FROM   #Temp y
                    WHERE  y.employeeId = x.employeeId
                    FOR JSON PATH, INCLUDE_NULL_VALUES ) data
FROM   #Temp x

which works, but its performance is not great. In this sample it takes 25 seconds to do this but in my real dataset it takes a lot longer. It takes 1 second to return all the results from the #Temp table. Is there anyway I can redesign my query here to improve the query time? I did try using a cursor to iterate through each employeeId and generating the json string that way, but it still sucks.

like image 499
Liv Avatar asked Mar 06 '23 18:03

Liv


1 Answers

Read "Performance Surprises and Assumptions : GROUP BY vs. DISTINCT" by Aaron Bertrand

Try to use GROUP BY instead of DISTINCT. DISTINCT throws duplicates away after the resultset was created, thus calling your call to JSON more often than needed. GROUP BY should first reduce the set to distinct employeeId values and perform the JSON only once for each.

Cannot test it at the moment, but this should do the same, just faster:

SELECT x.employeeId
              , (   SELECT y.category
                         , y.subCategory
                         , y.productName
                         , [Jul 2017] AS 'salesAmounts.Jul 2017'
                         , [Aug 2017] AS 'salesAmounts.Aug 2017'
                         , [Sep 2017] AS 'salesAmounts.Sep 2017'
                         , [Oct 2017] AS 'salesAmounts.Oct 2017'
                         , [Nov 2017] AS 'salesAmounts.Nov 2017'
                         , [Dec 2017] AS 'salesAmounts.Dec 2017'
                    FROM   #Temp y
                    WHERE  y.employeeId = x.employeeId
                    FOR JSON PATH, INCLUDE_NULL_VALUES ) data
FROM   #Temp x
GROUP BY x.EmployeeId
like image 176
Shnugo Avatar answered Mar 10 '23 12:03

Shnugo