I have the following set of data:
SalesPerson PackageHistoryID PackageID SalesPersonID EnrollmentAmount PackageType
-------------------------------------------------------------------------------------------
Jim Jones 2895 310 59019 27.15 New Member
Jim Jones 2895 310 59019 53.21 New Member
Jim Jones 2895 310 59019 42.35 New Member
Jim Jones 2916 221 59019 379.01 Renewal
Jim Jones 2932 326 59019 53.21 New Member
Jim Jones 2932 326 59019 27.15 New Member
Jim Jones 2933 326 59019 53.21 Renewal
Jim Jones 2933 326 59019 27.15 Renewal
Upon that data set I run the following query:
select Salesperson, PackageType, count(*) AS Packages, sum(EnrollmentAmount) AS Enrollment
from Sales2
group by SalesPerson, PackageType
order by SalesPerson, PackageType
...and I get these results:
Salesperson PackageType Packages Enrollment
----------------------------------------------------
Jim Jones New Member 5 203.07
Jim Jones Renewal 3 459.37
My final results as shown above are almost perfect. The only problem is the counts in the Packages
column. Instead of 5 and 3, the counts should be 2 and 2, because I want it to indicate the number of PackageTypes per PackageHistoryID, not per EnrollmentAmount. I want the EnrollmentAmounts summed so the records can be compressed such that PackageHistoryID never repeats. The first data set shown manifests a 1-many relationship between PackageHistory records and EnrollmentAmount. I thought my 2nd query (the group by) would aggregate this correctly but you can see that it shows 8 total PackageHistories when it really should only show 4.
Here is how the final result set should look:
Salesperson PackageType Packages Enrollment
----------------------------------------------------
Jim Jones New Member 2 203.07
Jim Jones Renewal 2 459.37
The 2 and 2 indicate the fact that there are really only 4 PackageHistory records in the result set; 2 are New Member and 2 are Renewal. The multiple EnrollmentAmount records are causing too many records and thus the counts get wrongly expanded in the final query.
Important note: Although SalesPerson is always the same in the results shown, these can sometimes be different, though they will be the same for any given PackageHistory (1-1). The grouping needs to be (1) by SalesPerson, then (2) by PackageType, and summarize/flatten the EnrollmentAmounts within each unique PackageHistory.
What query will give me correct results?
The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
Using COUNT, without GROUP BY clause will return a total count of a number of rows present in the table. Adding GROUP BY, we can COUNT total occurrences for each unique value present in the column.
To count the number of rows, use the id column which stores unique values (in our example we use COUNT(id) ). Next, use the GROUP BY clause to group records according to columns (the GROUP BY category above). After using GROUP BY to filter records with aggregate functions like COUNT, use the HAVING clause.
The following collapse functions take two input sequences, left and right, and produce the result sequence of type double where the computed scalar result is the first element. The two input sequences must be of the same type.
You should do a count(distinct PackageHistoryID)
instead of count(*)
:
select Salesperson, PackageType, count(distinct PackageHistoryID) AS Packages,
sum(EnrollmentAmount) AS Enrollment
from Sales2
group by SalesPerson, PackageType
order by SalesPerson, PackageType
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With