I have a working query that is grouping data by hardware model and a result, but the problem is there are many "results". I have tried to reduce that down to "if result = 0 then keep as 0, else set it to 1". This generally works, but I end up having:
day | name | type | case | count
------------+----------------+------+------+-------
2013-11-06 | modelA | 1 | 0 | 972
2013-11-06 | modelA | 1 | 1 | 42
2013-11-06 | modelA | 1 | 1 | 2
2013-11-06 | modelA | 1 | 1 | 11
2013-11-06 | modelB | 1 | 0 | 456
2013-11-06 | modelB | 1 | 1 | 16
2013-11-06 | modelB | 1 | 1 | 8
2013-11-06 | modelB | 3 | 0 | 21518
2013-11-06 | modelB | 3 | 1 | 5
2013-11-06 | modelB | 3 | 1 | 7
2013-11-06 | modelB | 3 | 1 | 563
Instead of the aggregate I am trying to achieve, where only 1 row per type/case combo.
day | name | type | case | count
------------+----------------+------+------+-------
2013-11-06 | modelA | 1 | 0 | 972
2013-11-06 | modelA | 1 | 1 | 55
2013-11-06 | modelB | 1 | 0 | 456
2013-11-06 | modelB | 1 | 1 | 24
2013-11-06 | modelB | 3 | 0 | 21518
2013-11-06 | modelB | 3 | 1 | 575
Here is my query:
select CURRENT_DATE-1 AS day, model.name, attempt.type,
CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END,
count(*)
from attempt attempt, prod_hw_id prod_hw_id, model model
where time >= '2013-11-06 00:00:00'
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by model.name, attempt.type, attempt.result
order by model.name, attempt.type, attempt.result;
Any tips on how I can achieve this would be awesome.
Day will always be defined in the WHERE
clause, so it will not vary. name, type, result(case)
and count
will vary. In short, for any given model I want only 1 row per "type + case" combo. As you can see in the first result set I have 3 rows for modelA
that have type=1
and case=1
(because there are many "result" values that I have turned into 0=0 and anything else=1). I want that to be represented as 1 row with the count aggregated as in example data set 2.
The case statement in SQL returns a value on a specified condition. We can use a Case statement in select queries along with Where, Order By, and Group By clause. It can be used in the Insert statement as well.
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
Syntax: SELECT column1, function_name(column2) FROM table_name WHERE condition GROUP BY column1, column2 HAVING condition ORDER BY column1, column2; function_name: Name of the function used for example, SUM() , AVG(). table_name: Name of the table. condition: Condition used.
The GROUP BY clause is used to get the summary data based on one or more groups. The groups can be formed on one or more columns. For example, the GROUP BY query will be used to count the number of employees in each department, or to get the department wise total salaries.
Your query would work already - except that you are running into naming conflicts or just confusing the output column (the CASE
expression) with source column result
, which has different content.
... GROUP BY model.name, attempt.type, attempt.result ...
You need to GROUP BY
your CASE
expression instead of your source column:
... GROUP BY model.name, attempt.type , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END ...
Or provide a column alias that's different from any column name in the FROM
list - or else that column takes precedence:
SELECT ... , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END AS result1 ... GROUP BY model.name, attempt.type, result1 ...
The SQL standard is rather peculiar in this respect. Quoting the manual here:
An output column's name can be used to refer to the column's value in
ORDER BY
andGROUP BY
clauses, but not in theWHERE
orHAVING
clauses; there you must write out the expression instead.
And:
If an
ORDER BY
expression is a simple name that matches both an output column name and an input column name,ORDER BY
will interpret it as the output column name. This is the opposite of the choice thatGROUP BY
will make in the same situation. This inconsistency is made to be compatible with the SQL standard.
Bold emphasis mine.
These conflicts can be avoided by using positional references (ordinal numbers) in GROUP BY
and ORDER BY
, referencing items in the SELECT
list from left to right. See solution below.
The drawback is, that this may be harder to read and vulnerable to edits in the SELECT
list (one might forget to adapt positional references accordingly).
But you do not have to add the column day
to the GROUP BY
clause, as long as it holds a constant value (CURRENT_DATE-1
).
Rewritten and simplified with proper JOIN syntax and positional references it could look like this:
SELECT m.name , a.type , CASE WHEN a.result = 0 THEN 0 ELSE 1 END AS result , CURRENT_DATE - 1 AS day , count(*) AS ct FROM attempt a JOIN prod_hw_id p USING (hard_id) JOIN model m USING (model_id) WHERE ts >= '2013-11-06 00:00:00' AND ts < '2013-11-07 00:00:00' GROUP BY 1,2,3 ORDER BY 1,2,3;
Also note that I am avoiding the column name time
. That's a reserved word and should never be used as identifier. Besides, your "time" obviously is a timestamp
or date
, so that is rather misleading.
can you please try this: replace the case statement with the below one
Sum(CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END) as Count,
Aliases can be used only if they were introduced in the preceding step. So aliases in the SELECT
clause can be used in the ORDER BY
but not the GROUP BY
clause.
Reference: Microsoft T-SQL Documentation for further reading.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Hope this helps.
For TSQL I like to encapsulate case statements in an outer apply. This prevents me from having to have the case statement written twice, allows reference to the case statement by alias in future joins and avoids the need for positional references.
select oa.day,
model.name,
attempt.type,
oa.result
COUNT(*) MyCount
FROM attempt attempt, prod_hw_id prod_hw_id, model model
WHERE time >= '2013-11-06 00:00:00'
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
OUTER APPLY (
SELECT CURRENT_DATE-1 AS day,
CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END result
) oa
group by oa.day,
model.name,
attempt.type,
oa.result
order by model.name, attempt.type, oa.result;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With