Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple STRING_AGG on multiple join columns causes bloated aggregation

Tags:

sql

sql-server

I've got a table in my MSSQL server, lets call it blogPost. I've also got two tag tables, lets call them fooTag and barTag. The tag tables are used to tag the blogPost table which are identically structured.

blogPost

| postId | title               |        body |
+--------+---------------------+-------------+
| 1      | The life on a query | lorem ipsum |
+--------+---------------------+-------------+

fooTag and barTag

| postId | tagName      |
+--------+--------------+
| 1      | sql          |
| 1      | query        |
| 1      | select-query |
+--------+--------------+

I want to get a single blogpost along with all it's tags in a single row so then STRING_AGG() feels suitable to make a query like this:

SELECT blogPost.*, STRING_AGG(fooTag.tagName, ';') as [fooTags], STRING_AGG(barTag.tagName, ';') as [barTags]
FROM blogPost
LEFT JOIN fooTag ON blogPost.postId = fooTag.postId
LEFT JOIN barTag ON blogPost.postId = barTag.postId
WHERE postId = 1
GROUP BY blogPost.postId, title, body

When making this query I'd expect to get the result

| postId | title               |        body | fooTags                 | barTags                 |
+--------+---------------------+-------------+-------------------------+-------------------------+
| 1      | The life on a query | lorem ipsum | sql;query;select-query | sql;query;select-query |
+--------+---------------------+-------------+-------------------------+-------------------------+

But I'm getting this result instead where bar tags (i.e. the last STRING_AGG selected) are duplicated.

| postId | title               |        body | fooTags                 | barTags                                       |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
| 1      | The life on a query | lorem ipsum | sql;query;select-query; | sql;sql;sql;query;query;query;select-query;select-query;select-query |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+

Putting barTags last in the SELECT statement makes it so that barTags gets the duplicates instead of fooTags. The amount of duplicates created seem to be bound to the amount of rows columns being aggregated together in the first STRING_AGG result column, so if fooTags has 5 rows to aggregate together there will be 5 duplicates of each barTag in the barTags column in the result.

How would I get the result I want without duplicates?

like image 757
Hannes Kindströmmer Avatar asked Mar 03 '23 08:03

Hannes Kindströmmer


2 Answers

Your problem is caused by each row in fooTags creating that many rows of barTags in the JOIN, hence the duplication. You can work around this issue by performing the STRING_AGG in the footags and bartags tables before JOINing them:

SELECT blogPost.*, f.tags as [fooTags], b.tags as [barTags]
FROM blogPost
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
           FROM fooTag
           GROUP BY postId) f ON blogPost.postId = f.postId
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
           FROM barTag
           GROUP BY postId) b ON blogPost.postId = b.postId
WHERE postId = 1
like image 133
Nick Avatar answered May 03 '23 07:05

Nick


You can simplify the query like so:

SELECT blogPost.*, ca1.*, ca2.*
FROM blogPost
OUTER APPLY (
    SELECT STRING_AGG(tagName, ';')
    FROM fooTag
    WHERE blogPost.postId = fooTag.postId
) AS ca1(fooTags)
OUTER APPLY (
    SELECT STRING_AGG(tagName, ';')
    FROM barTag
    WHERE blogPost.postId = barTag.postId
) AS ca2(barTags)
WHERE postId = 1

No GROUP BY required, in your case it'll be an expensive operation.

like image 21
Salman A Avatar answered May 03 '23 06:05

Salman A