Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse and summarize counts within group by

I have the following set of data:

SalesPerson PackageHistoryID    PackageID   SalesPersonID   EnrollmentAmount    PackageType
-------------------------------------------------------------------------------------------
Jim Jones   2895                310         59019           27.15               New Member
Jim Jones   2895                310         59019           53.21               New Member
Jim Jones   2895                310         59019           42.35               New Member
Jim Jones   2916                221         59019           379.01              Renewal
Jim Jones   2932                326         59019           53.21               New Member
Jim Jones   2932                326         59019           27.15               New Member
Jim Jones   2933                326         59019           53.21               Renewal
Jim Jones   2933                326         59019           27.15               Renewal

Upon that data set I run the following query:

select Salesperson, PackageType, count(*) AS Packages, sum(EnrollmentAmount) AS Enrollment
from Sales2
group by SalesPerson, PackageType
order by SalesPerson, PackageType

...and I get these results:

Salesperson    PackageType    Packages     Enrollment
----------------------------------------------------
Jim Jones      New Member     5            203.07
Jim Jones      Renewal        3            459.37

My final results as shown above are almost perfect. The only problem is the counts in the Packages column. Instead of 5 and 3, the counts should be 2 and 2, because I want it to indicate the number of PackageTypes per PackageHistoryID, not per EnrollmentAmount. I want the EnrollmentAmounts summed so the records can be compressed such that PackageHistoryID never repeats. The first data set shown manifests a 1-many relationship between PackageHistory records and EnrollmentAmount. I thought my 2nd query (the group by) would aggregate this correctly but you can see that it shows 8 total PackageHistories when it really should only show 4.

Here is how the final result set should look:

Salesperson    PackageType    Packages     Enrollment
----------------------------------------------------
Jim Jones      New Member     2            203.07
Jim Jones      Renewal        2            459.37

The 2 and 2 indicate the fact that there are really only 4 PackageHistory records in the result set; 2 are New Member and 2 are Renewal. The multiple EnrollmentAmount records are causing too many records and thus the counts get wrongly expanded in the final query.

Important note: Although SalesPerson is always the same in the results shown, these can sometimes be different, though they will be the same for any given PackageHistory (1-1). The grouping needs to be (1) by SalesPerson, then (2) by PackageType, and summarize/flatten the EnrollmentAmounts within each unique PackageHistory.

What query will give me correct results?

like image 856
HerrimanCoder Avatar asked Apr 02 '15 20:04

HerrimanCoder


People also ask

Can count be used with GROUP BY?

The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

Does Count work without GROUP BY?

Using COUNT, without GROUP BY clause will return a total count of a number of rows present in the table. Adding GROUP BY, we can COUNT total occurrences for each unique value present in the column.

How do I count rows in a GROUP BY?

To count the number of rows, use the id column which stores unique values (in our example we use COUNT(id) ). Next, use the GROUP BY clause to group records according to columns (the GROUP BY category above). After using GROUP BY to filter records with aggregate functions like COUNT, use the HAVING clause.

What is collapse in SQL?

The following collapse functions take two input sequences, left and right, and produce the result sequence of type double where the computed scalar result is the first element. The two input sequences must be of the same type.


1 Answers

You should do a count(distinct PackageHistoryID) instead of count(*):

select Salesperson, PackageType, count(distinct PackageHistoryID) AS Packages,
       sum(EnrollmentAmount) AS Enrollment
from Sales2
group by SalesPerson, PackageType
order by SalesPerson, PackageType
like image 153
Giorgos Betsos Avatar answered Oct 03 '22 06:10

Giorgos Betsos