Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List or concatenate in SQL window function

A (relatively new to SQL) member of my team was working on writing an SQL query that happened to use a window function. Upon reviewing I noted that they structured their window function like this:

COUNT(*) OVER(PARTITION BY Part1+Part2) AS A

Which I instantly made a feedback note to say it should be like this:

COUNT(*) OVER(PARTITION BY Part1, Part2) AS A

Both Part1 and Part2 are nvarchars.

Then I paused to reflect and I couldn't actually work out why that would be wrong. As far as I can see that would actually produce identical results (it does). The actual execution plan is nearly identical aside from an extra Compute Scalar step after the initial table scan on the first query (this is 0% of query cost). The I/O statistics show that the first version has 5 fewer logical reads (12,665 to 12,670).

So is there any benefit/detriment to using either form, aside from coding conventions? Is it a case that this works fine in this instance, but in certain circumstances could produce inconsistent results?

like image 621
CobaltZorch Avatar asked Sep 30 '19 13:09

CobaltZorch


People also ask

How do I concatenate a list in SQL?

You can concatenate rows into single string using COALESCE method. This COALESCE method can be used in SQL Server version 2008 and higher. All you have to do is, declare a varchar variable and inside the coalesce, concat the variable with comma and the column, then assign the COALESCE to the variable.

How do you concatenate the results of SQL query?

To append a string to another and return one result, use the || operator. This adds two strings from the left and right together and returns one result. If you use the name of the column, don't enclose it in quotes. However, in using a string value as a space or text, enclose it in quotes.

What are windowing functions in SQL?

In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.)


2 Answers

Both expressions are valid, but they do not do the same thing.

Consider the following data:

Part1    Part2
AB       C
A        BC

When concatenating strings with PARTITION BY Part1+Part2 both records fall in the same partition, whereas when using PARTITION BY Part1, Part2, they would belong to different partitions.

So the question actually comes down to: what is the correct partitioning criteria for your use case? Usually, unless you are doing something fancy, you want PARTITION BY Part1, Part2. But this actually has to be answered from functional perspective, based on your real use case.

like image 193
GMB Avatar answered Oct 19 '22 00:10

GMB


The PARTITION expression is exactly that -- an expression. So you can put almost any form of an expression in there and use that value to partition the rows.

In terms of inconsistent results, you will run into a problem if you have this case:

Part1    Part2    Part1 + Part2
'yummy'  'sushi'  'yummysushi'
'yumm'   'ysushi' 'yummysushi'

Both rows would be considered to be part of the same partition, even though the columns have different values.

In terms of performance, my only guess is if you have an index or any partitioning scheme set up on those particular columns, you might get an improvement there.

Best bet is to use the second case you specified (Part1, Part2).

like image 45
ravioli Avatar answered Oct 18 '22 23:10

ravioli