On SQL Server configured as case-insensitive, group by
can have interesting results when the [n][var]char
column is not the first group by
column. Essentially, it looks like whatever row it encounters "first" (where "first" is undefined in the absence of an order): wins for that grouping. For example:
select x.[day], x.[name], count(1) as [count]
from (
select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
) x group by x.[day], x.[name]
which returns, for me:
day name count
----------- ---- -----------
1 A 2
2 a 2
Using min(x.[name])
has no effect, since the grouping already happened.
I can't add an order by
before the group by
, as that is illegal; and adding the order by
after the group by
just defines the output order after the grouping - it still gives a
and A
.
So: is there a sane way of doing this where the capitalization will at least be consistent for all the groupings? (I'll leave for another day the problem of being consistent for separate runs)
Desired output, either:
day name count
----------- ---- -----------
1 A 2
2 A 2
or:
day name count
----------- ---- -----------
1 a 2
2 a 2
Edit: without destroying capitalisation when it is consistent between groups. So no upper/lower. So if one of the groups consistently has the value BcDeF
, I want the result of that row to be BcDeF
, not bcdef
or BCDEF
.
I would use windowing functions for this. By using ROW_NUMBER
and partitioning using a case insensitive collation, but ordering by a case sensitive one, we will choose consistently one result with the original capitalisation, but it will group them as if they are the same:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY [day], [name]
ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS),
N = COUNT(*) OVER(PARTITION BY [day], [name])
FROM ( select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
union all select 3, 'BcDeF'
union all select 3, 'bCdEf') X
)
SELECT *
FROM CTE
WHERE RN = 1;
It returns:
╔═════╦═══════╦════╦═══╗
║ day ║ name ║ RN ║ N ║
╠═════╬═══════╬════╬═══╣
║ 1 ║ A ║ 1 ║ 2 ║
║ 2 ║ A ║ 1 ║ 2 ║
║ 3 ║ BcDeF ║ 1 ║ 2 ║
╚═════╩═══════╩════╩═══╝
Following @AndriyM's comment, if you want the same capitalisation over the whole result set, and not just the same day, you can use:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY [day], [name]
ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS),
N = COUNT(*) OVER(PARTITION BY [day], [name])
FROM ( select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
union all select 3, 'BcDeF'
union all select 3, 'bCdEf') X
)
SELECT [day],
MAX([name] COLLATE SQL_Latin1_General_Cp1_CS_AS) OVER (PARTITION BY [name]) [name],
N
FROM CTE
WHERE RN = 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With