A (relatively new to SQL) member of my team was working on writing an SQL query that happened to use a window function. Upon reviewing I noted that they structured their window function like this:
COUNT(*) OVER(PARTITION BY Part1+Part2) AS A
Which I instantly made a feedback note to say it should be like this:
COUNT(*) OVER(PARTITION BY Part1, Part2) AS A
Both Part1 and Part2 are nvarchars.
Then I paused to reflect and I couldn't actually work out why that would be wrong. As far as I can see that would actually produce identical results (it does). The actual execution plan is nearly identical aside from an extra Compute Scalar step after the initial table scan on the first query (this is 0% of query cost). The I/O statistics show that the first version has 5 fewer logical reads (12,665 to 12,670).
So is there any benefit/detriment to using either form, aside from coding conventions? Is it a case that this works fine in this instance, but in certain circumstances could produce inconsistent results?
You can concatenate rows into single string using COALESCE method. This COALESCE method can be used in SQL Server version 2008 and higher. All you have to do is, declare a varchar variable and inside the coalesce, concat the variable with comma and the column, then assign the COALESCE to the variable.
To append a string to another and return one result, use the || operator. This adds two strings from the left and right together and returns one result. If you use the name of the column, don't enclose it in quotes. However, in using a string value as a space or text, enclose it in quotes.
In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.)
Both expressions are valid, but they do not do the same thing.
Consider the following data:
Part1 Part2
AB C
A BC
When concatenating strings with PARTITION BY Part1+Part2
both records fall in the same partition, whereas when using PARTITION BY Part1, Part2
, they would belong to different partitions.
So the question actually comes down to: what is the correct partitioning criteria for your use case? Usually, unless you are doing something fancy, you want PARTITION BY Part1, Part2
. But this actually has to be answered from functional perspective, based on your real use case.
The PARTITION
expression is exactly that -- an expression. So you can put almost any form of an expression in there and use that value to partition the rows.
In terms of inconsistent results, you will run into a problem if you have this case:
Part1 Part2 Part1 + Part2
'yummy' 'sushi' 'yummysushi'
'yumm' 'ysushi' 'yummysushi'
Both rows would be considered to be part of the same partition, even though the columns have different values.
In terms of performance, my only guess is if you have an index or any partitioning scheme set up on those particular columns, you might get an improvement there.
Best bet is to use the second case you specified (Part1, Part2)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With