I've got a table in my MSSQL server, lets call it blogPost
. I've also got two tag tables, lets call them fooTag
and barTag
. The tag tables are used to tag the blogPost
table which are identically structured.
blogPost
| postId | title | body |
+--------+---------------------+-------------+
| 1 | The life on a query | lorem ipsum |
+--------+---------------------+-------------+
fooTag and barTag
| postId | tagName |
+--------+--------------+
| 1 | sql |
| 1 | query |
| 1 | select-query |
+--------+--------------+
I want to get a single blogpost along with all it's tags in a single row so then STRING_AGG()
feels suitable to make a query like this:
SELECT blogPost.*, STRING_AGG(fooTag.tagName, ';') as [fooTags], STRING_AGG(barTag.tagName, ';') as [barTags]
FROM blogPost
LEFT JOIN fooTag ON blogPost.postId = fooTag.postId
LEFT JOIN barTag ON blogPost.postId = barTag.postId
WHERE postId = 1
GROUP BY blogPost.postId, title, body
When making this query I'd expect to get the result
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query | sql;query;select-query |
+--------+---------------------+-------------+-------------------------+-------------------------+
But I'm getting this result instead where bar tags (i.e. the last STRING_AGG selected) are duplicated.
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query; | sql;sql;sql;query;query;query;select-query;select-query;select-query |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
Putting barTags
last in the SELECT statement makes it so that barTags
gets the duplicates instead of fooTags
. The amount of duplicates created seem to be bound to the amount of rows columns being aggregated together in the first STRING_AGG
result column, so if fooTags
has 5 rows to aggregate together there will be 5 duplicates of each barTag
in the barTags
column in the result.
How would I get the result I want without duplicates?
Your problem is caused by each row in fooTags
creating that many rows of barTags
in the JOIN
, hence the duplication. You can work around this issue by performing the STRING_AGG
in the footags
and bartags
tables before JOIN
ing them:
SELECT blogPost.*, f.tags as [fooTags], b.tags as [barTags]
FROM blogPost
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM fooTag
GROUP BY postId) f ON blogPost.postId = f.postId
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM barTag
GROUP BY postId) b ON blogPost.postId = b.postId
WHERE postId = 1
You can simplify the query like so:
SELECT blogPost.*, ca1.*, ca2.*
FROM blogPost
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM fooTag
WHERE blogPost.postId = fooTag.postId
) AS ca1(fooTags)
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM barTag
WHERE blogPost.postId = barTag.postId
) AS ca2(barTags)
WHERE postId = 1
No GROUP BY required, in your case it'll be an expensive operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With