Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SQL concatenation with ORDER BY

I'm confused. How could you explain this diffenece in variable concatenation with ORDER BY?

declare @tbl table (id int);
insert into @tbl values (1), (2), (3);

declare @msg1 varchar(100) = '', @msg2 varchar(100) = '',
    @msg3 varchar(100) = '',    @msg4 varchar(100) = '';

select @msg1 = @msg1 + cast(id as varchar) from @tbl
order by id;

select @msg2 = @msg2 + cast(id as varchar) from @tbl
order by id+id;

select @msg3 = @msg3 + cast(id as varchar) from @tbl
order by id+id desc;

select TOP(100) @msg4 = @msg4 + cast(id as varchar) from @tbl
order by id+id;

select
    @msg1 as msg1,
    @msg2 as msg2,
    @msg3 as msg3,
    @msg4 as msg4;

Results

msg1  msg2  msg3  msg4
----  ----  ----  ----
123   3     1     123  
like image 405
Alexander Sigachov Avatar asked Mar 18 '15 13:03

Alexander Sigachov


People also ask

Can we use ORDER BY in SQL?

The SQL ORDER BY KeywordThe ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.

Can you ORDER BY 2 things in SQL?

After the ORDER BY keyword, add the name of the column by which you'd like to sort records first (in our example, salary). Then, after a comma, add the second column (in our example, last_name ). You can modify the sorting order (ascending or descending) separately for each column.

Can we use concat with group by in SQL?

MySQL | Group_CONCAT() Function. The GROUP_CONCAT() function in MySQL is used to concatenate data from multiple rows into one field. This is an aggregate (GROUP BY) function which returns a String value, if the group contains at least one non-NULL value. Otherwise, it returns NULL.


1 Answers

As many have confirmed, this is not the right way to concatenate all the rows in a column into a variable - even though in some cases it does "work". If you want to see some alternatives, please check out this blog.

According to MSDN (applies to SQL Server 2008 through 2014 and Azure SQL Database) , the SELECT should not be used to assign local variables. In the remarks, it describes how, when you do use the SELECT, it attempts to behave. The interesting points to note:

  • While typically it should only be used to return a single value to a variable, when the expression is the name of the column, it can return multiple values.
  • When the expression does return multiple values, the variable is assigned the last value that is returned.
  • If no value is returned, the variable retains its original value (not directly relevant here, but worth noting).

The first two points here are key - concatenation happens to work because SELECT @msg1 = @msg1 + cast(id as varchar) is essentially SELECT @msg1 += cast(id as varchar), and as the syntax notes, += is an accepted compound assignment operator on this expression. Please note here that it should not be expected this operation to continue to be supported on VARCHAR and to do string concatenation - just because it happens to work in some situations doesn't mean it is ok for production code.

The bottom line as to the underlying reason is whether the Compute Scalar that runs on the select expression uses the original id column or an expression of the id column. You probably can't find any docs on why the optimizer might choose the specific plans for each query, but each example highlights different use cases that allow the msg value to be evaluated from the column (and therefore multiple rows being returned and concatenated) or expression (and therefore only the last column).

  1. @msg1 is '123' because the Compute Scalar (the row-by-row evaluation of the variable assignment) occurs after the Sort. This allows the scalar computation to return multiple values on the id column concatenating them through the += compound operator. I doubt there is specific documentation why, but it appears the optimizer chose to do the sort before the scalar computation because the order by was a column and not an expression.

  2. @msg2 is '3' because the Compute Scalar is done before the sort, which leaves the @msg2 in each row just being the ('' + id) - so never concatenated, just the value of the id. Again, probably not any documentation why the optimizer chose this, but it appears that since the order by was an expression, perhaps it needed to do the (id+id) in the order by as part of the scalar computation before it could sort. At this point, your original column is no longer referencing the source column, but it has been replaced by an expression. Therefore, as MSDN stated, your first column points to an expression, not a column, so the behavior assigns the last value of the result set to the variable in the SELECT. Since you sorted ASC, you get '3' here.

  3. @msg3 is '1' for the same reason as example 2, except you ordered DESC. Again, this becomes an expression in the evaluation - not the original column, so therefore the assignment gets the last value of the DESC order, so you get '1'.

  4. @msg4 is '123' again because the TOP operation forces an initial scalar evaluation of the ORDER BY so that it can determine your top 100 records. This is different than examples 2 and 3 in which the scalar computation contained both the order by and select computations which caused each example to be an expression and not refer back to the original column. Example 4 has the TOP separating the ORDER BY and SELECT computations, so after the SORT (TOP N SORT) is applied, it then does the scalar computation for the SELECT columns in which at this point you are still referencing the original column (not an expression of the column), and therefore it returns multiple rows allowing the concatenation to occur.

Sources:

  • MSDN: https://msdn.microsoft.com/en-us/library/ms187330.aspx
like image 112
Jason W Avatar answered Nov 14 '22 23:11

Jason W