Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL - How to do a group by without having to pass all the columns from the select?

Tags:

sql

firebird

I have the following select, whose goal is to select all customers who had no sales since the day X, and also bringing the date of the last sale and the number of the sale:

select s.customerId, s.saleId, max (s.date) from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'

But this way it brings me the following:

19 | 300 | 26/09/2005
19 | 356 | 29/09/2005
27 | 842 | 10/05/2012

In another words, the first 2 lines are from the same customer(id 19), i wish he'd get only one record for each client, which would be the record with the max date, in the case, the second record from this list. By that logic, i should take off s.saleId from the "group by" clause, but if i do, of course, i get the error: Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause).

I'm using firebird 1.5

How can I do this?

like image 869
Mateus Viccari Avatar asked May 16 '13 12:05

Mateus Viccari


People also ask

Do all SELECT columns need to be in GROUP BY?

If a SELECT clause contains column expressions that are not aggregate expressions, and if a GROUP BY clause is specified, those column expressions must be in the GROUP BY clause.

Do I have to include everything in GROUP BY?

All fields in the SELECT field list must either be included in the GROUP BY clause or be included as arguments to an SQL aggregate function.

How does GROUP BY work with multiple columns in SQL?

The GROUP BY clause is used along with some aggregate functions to group columns with the same values in different rows. The group by multiple columns technique retrieves grouped column values from one or more database tables by considering more than one column as grouping criteria.

How do I exclude a column from a GROUP BY?

To exclude column from GROUP BY clause we can use Analytical function. In the above example, we have used multiple group functions on multiple combinations of columns.


2 Answers

GROUP BY summarizes data by aggregating a group of rows, returning one row per group. You're using the aggregate function max(), which will return the maximum value from one column for a group of rows.

Let's look at some data. I renamed the column you called "date".

create table sales (
  customerId integer not null,
  saleId integer not null,
  saledate date not null
  );


insert into sales values
(1, 10, '2013-05-13'),
(1, 11, '2013-05-14'),
(1, 12, '2013-05-14'),
(1, 13, '2013-05-17'),
(2, 20, '2013-05-11'),
(2, 21, '2013-05-16'),
(2, 31, '2013-05-17'),
(2, 32, '2013-03-01'),
(3, 33, '2013-05-14'),
(3, 35, '2013-05-14');

You said

In another words, the first 2 lines are from the same customer(id 19), i wish he'd get only one record for each client, which would be the record with the max date, in the case, the second record from this list.

select s.customerId, max (s.saledate) 
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;

customerId  max
--
1           2013-05-14 
2           2013-05-16
3           2013-05-14

What does that table mean? It means that the latest date on or before May 16 on which customer "1" bought something was May 14; the latest date on or before May 16 on which customer "2" bought something was May 16. If you use this derived table in joins, it will return predictable results with consistent meaning.

Now let's look at a slightly different query. MySQL permits this syntax, and returns the result set below.

select s.customerId, s.saleId, max(s.saledate) max_sale
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;

customerId  saleId  max_sale
--
1           10      2013-05-14
2           20      2013-05-16
3           33      2013-05-14

The sale with ID "10" didn't happen on May 14; it happened on May 13. This query has produced a falsehood. Joining this derived table with the table of sales transactions will compound the error.

That's why Firebird correctly raises an error. The solution is to drop saleId from the SELECT clause.

Now, having said all that, you can find the customers who have had no sales since May 16 like this.

select distinct customerId from sales
where customerID not in
  (select customerId
  from sales
  where saledate >= '2013-05-16')

And you can get the right customerId and the "right" saleId like this. (I say "right" saleId, because there could be more than one on the day in question. I just chose the max.)

select sales.customerId, sales.saledate, max(saleId)
from sales
inner join (select customerId, max(saledate) max_date
            from sales
            where saledate < '2013-05-16'
            group by customerId) max_dates
        on sales.customerId = max_dates.customerId
        and sales.saledate = max_dates.max_date
inner join (select distinct customerId 
            from sales
            where customerID not in
              (select customerId
               from sales
               where saledate >= '2013-05-16')) no_sales
        on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate

Personally, I find common table expressions make it easier for me to read SQL statements like that without getting lost in the SELECTs.

with no_sales as (
  select distinct customerId 
  from sales
  where customerID not in
    (select customerId
     from sales
     where saledate >= '2013-05-16')
),
max_dates as (
  select customerId, max(saledate) max_date
  from sales
  where saledate < '2013-05-16'
  group by customerId
)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join max_dates
        on sales.customerId = max_dates.customerId
        and sales.saledate = max_dates.max_date
inner join no_sales
        on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
like image 78
Mike Sherrill 'Cat Recall' Avatar answered Sep 29 '22 06:09

Mike Sherrill 'Cat Recall'


then you can use following query ..

EDIT changes made after comment by likeitlikeit for only one row per CustomerID even when we will have one case where we have multiple saleID for customer with certain condition -

select x.customerID, max(x.saleID), max(x.x_date) from (
select s.customerId, s.saleId, max (s.date) x_date from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
   and max(s.date) = ( select max(s1.date) 
                         from sales s1 
                        where s1.customeId = s.customerId))x

group by x.customerID

like image 25
pratik garg Avatar answered Sep 29 '22 06:09

pratik garg