I am having a harder time than I would expect writing a query that includes aggregate functions, queries multiple tables, and results in accurate numbers and hoping I can get some help.
SQL Fiddle
The category is what I need to report on:
|----|-----------|
| id | name |
|----|-----------|
| 1 | furniture |
| 2 | music |
| 3 | kitchen |
| 4 | adventure |
|----|-----------|
Product table example:
|-----|----------------|-------------|
| id | name | category_id |
|-----|----------------|-------------|
| 101 | couch | 1 |
| 102 | chair | 1 |
| 103 | drum | 2 |
| 104 | flute | 2 |
| 105 | pot | 3 |
| 106 | pan | 3 |
| 107 | kitchen sink | 3 |
| 108 | unicorn saddle | 4 |
| 109 | unicorn shoes | 4 |
| 110 | horse shampoo | 4 |
|-----|----------------|-------------|
The views data we want to sum (by category) found in the activity table:
|----|------------|-------|
| id | product_id | views |
|----|------------|-------|
| 1 | 101 | 1000 |
| 2 | 102 | 2000 |
| 3 | 103 | 3000 |
| 4 | 104 | 4000 |
| 5 | 105 | 5000 |
| 6 | 106 | 6000 |
| 7 | 107 | 7000 |
| 8 | 108 | 8000 |
| 9 | 109 | 9000 |
| 10 | 110 | 10000 |
|----|------------|-------|
The sales table that we want to put query the average sale (again by category). Note that the vendor_id is important as a single product can be carried by multiple vendors. I am leaving out the vendor table as its not needed for this question (we can just query using the vendor ID in later examples).
|----|------------|-----------|--------|
| id | product_id | vendor_id | amount |
|----|------------|-----------|--------|
| 1 | 101 | 1 | 1000 |
| 2 | 102 | 1 | 900 |
| 3 | 103 | 1 | 2000 |
| 4 | 105 | 1 | 3000 |
| 5 | 107 | 1 | 5000 |
| 6 | 101 | 2 | 600 |
| 7 | 103 | 2 | 7000 |
| 8 | 105 | 2 | 8000 |
| 9 | 107 | 2 | 1000 |
| 10 | 108 | 1 | 500 |
| 11 | 109 | 1 | 600 |
| 12 | 108 | 2 | 400 |
| 13 | 109 | 2 | 500 |
|----|------------|-----------|--------|
Below is the desired output:
**Note that some vendors do not carry some products, therefore, meaning that there is no average sales amount..or, in other words, there are no records in the sales table for some products found in the product table (e.g. no vendors carry horse shampoo). For this reason, I want to make sure that any averages or sums that I am using are in fact accurate. To be specific here, if a **.
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
| category | count_products | sum_views | average_sales | sum_views_where_sales_=>_1000 | sum_views_sales_<_1000 |
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
| adventure | 3 | 27000 | 500 | 0 | 27000 |
| furniture | 2 | 3000 | 833 | 0 | 3000 |
| kitchen | 3 | 18000 | 3000 | 6000 | 12000 |
| music | 2 | 7000 | 5000 | 7000 | 0 |
|-----------|----------------|-----------|---------------|-------------------------------|-------------------------|
First to get an accurate count on products and views:
SELECT cat.name AS category,
count(distinct p.name) AS product,
sum(a.views) AS views
FROM
category AS cat,
product AS p,
activity AS a
WHERE
cat.id=p.category_id
AND
p.id=a.product_id
GROUP BY
category;
sidenote: I would love to not have to use the distinct in the above query. Any ideas here would be great.
Accurate result displaying views by category:
|-----------|---------|-------|
| category | product | views |
|-----------|---------|-------|
| Adventure | 3 | 27000 |
| Furniture | 2 | 3000 |
| Kitchen | 3 | 18000 |
| Music | 2 | 7000 |
|-----------|---------|-------|
Everything looks good until I start joining to the other tables:
SELECT cat.name AS category,
count(distinct p.name) AS product,
sum(a.views) AS views,
round(avg(s.amount)) AS sales_amount
FROM
category AS cat,
product AS p,
activity AS a,
sales AS s
WHERE
cat.id=p.category_id
AND
p.id=a.product_id
AND
p.id=s.product_id
AND
s.vendor_id=1
GROUP BY
category;
PROBLEM OUTPUT
|-----------|---------|-------|------------------|
| category | product | views | avg_sales_amount |
|-----------|---------|-------|------------------|
| Adventure | 2 | 17000 | 550 |
| Furniture | 2 | 3000 | 950 |
| Kitchen | 2 | 12000 | 4000 |
| Music | 1 | 3000 | 2000 |
|-----------|---------|-------|------------------|
As you can notice I am getting further from the desired output when I start querying by vendor_id to get the average sales amount. To be specific, the product column no longer results in the correct amount of products because not all vendors carry all the same products making the s.vendor_id=1 filter difficult. I have to use it to be able to filter these reports by vendor while still getting accurate sums on the view field.
I have tried the above queries using a LEFT JOIN but still ends up with innacurrate results and not sure what needs to happen, possibly a sub query of some sort?
Your report requirement is surprisingly complex. You may have gone into this project thinking it is far simpler than it is.
This is a case where you are reporting on summaries based on independently tabulated measures (views and sales).
Therefore, you need to start with aggregate subqueries that do not join the two detailed measurement tables together. This is one such query. It gets you views by category. http://sqlfiddle.com/#!9/02f4b6/31/0
SELECT c.id category_id, SUM(a.views) views
FROM activity a
JOIN product p ON a.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
This is the other such query. It gets you sales by category. http://sqlfiddle.com/#!9/02f4b6/32/0
SELECT c.id category_id,
SUM(s.amount) total_sales,
AVG(s.amount) avg_sales
FROM sales s
JOIN product p ON s.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
Next you need the count of products by category. Fortunately each product can only be in one category. http://sqlfiddle.com/#!9/02f4b6/42/0
SELECT c.id category_id,
COUNT(*) products
FROM product p
JOIN category c ON p.category_id = c.id
GROUP BY c.id
Now, it's necessary to join these items together. Start with the category
table and LEFT JOIN
the other three, like this. http://sqlfiddle.com/#!9/02f4b6/51/0
SELECT c.name, aggproducts.products,
aggviews.views, aggsales.avg_sales,
aggsales.total_sales
FROM category c
LEFT JOIN (
SELECT c.id category_id, SUM(a.views) views
FROM activity a
JOIN product p ON a.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggviews ON c.id = aggviews.category_id
LEFT JOIN (
SELECT c.id category_id,
SUM(s.amount) total_sales,
AVG(s.amount) avg_sales
FROM sales s
JOIN product p ON s.product_id = p.id
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggsales ON c.id = aggsales.category_id
LEFT JOIN (
SELECT c.id category_id,
COUNT(*) products
FROM product p
JOIN category c ON p.category_id = c.id
GROUP BY c.id
) aggproducts ON c.id = aggproducts.category_id
The trick is to create aggregate subqueries for each measure that contain either zero or one rows per category. If any aggregate subqueries contains more than one row per category, you start getting duplication of rows because of JOIN combinatorial explosion.
Then you LEFT JOIN
those aggregate subqueries to the category table. Don't use ordinary JOIN
because that will suppress rows from the result if any of the aggregate subqueries are missing the particular category.
Notice that you are using these subqueries as if they were tables. This ability to build up queries from subqueries is what puts the structured in structured query language.
Those are the basics. Now you need yet another aggregate subquery for those conditional sums. I'm going to leave that one to you.
Please try the following...
SELECT Category.name AS category,
COUNT( * ) AS count_Product,
SUM( views ) AS sum_views,
ROUND( COALESCE( SUM( sumAmount ) / SUM( countAmounts ), 0 ) ) AS average_sales,
SUM( whereGreater ) AS 'sum_views_where_sales_=>_1000',
SUM( whereLesser ) AS 'sum_views_sales_<_1000'
FROM Category
JOIN Product ON Category.id = Product.category_id
JOIN Activity ON Product.id = Activity.product_id
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
) sumCountAmountFinder ON Product.id = sumCountAmountFinder.product_id
JOIN ( SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
) sumWhereFinder ON Product.id = sumWhereFinder.product_id
GROUP BY Category.name;
Assumptions
That a record in Category
will always have at least one record in Product
associated with it.
That every record in Product
will have a corresponding record in Activity
.
Explanation
My statement starts by performing an INNER JOIN
between Category
and Product
, effectively giving us a list of Products
associated with each Category
.
An INNER JOIN
is performed between Activity
and the above joined dataset, effectively appending each value of the views
field to its corresponding Product
record.
The following subquery is then used to determine the total of amount
for each product_id
IN Sales
and the number of records for each product_id
. A LEFT JOIN
is then performed between Product
and and the subquery, effectively appending each record from the subquery to its corresponding Product
in the above joined dataset. A LEFT JOIN
was used rather than an INNER JOIN
as not all records from Product
will have corresponding records in Sales
and we do not wish for any record from Product
to be excluded on the basis of that absence.
SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
Another subquery (as follows) is then used to calculate the mean average of amount
for each product_id
in Sales
.
SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
A LEFT JOIN
is then performed between Activity
and the results of the subquery. The mean average of amount
for each product_id
is then compared to 1000
and that product_id
's corresponding value of views
is placed in the appropriate field, with 0
being placed in the other. If a value of product_id
does not have any corresponding records in Sales
, then 0
is placed in both fields. The resulting produced by the parent subquery (which follows) is then joined to the above joined dataset.
SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
With our final joined dataset now formed, the records are then grouped by their value of Category.name
. Each group's value of Category.name
and its corresponding aggregated values are calculated and returned.
Testing
My statement was tested against a sample database created using the following code...
CREATE TABLE Category
(
id INT,
name VARCHAR( 50 )
);
INSERT INTO Category ( id,
name )
VALUES ( 1, 'furniture' ),
( 2, 'music' ),
( 3, 'kitchen' ),
( 4, 'adventure' );
CREATE TABLE Product
(
id INT,
name VARCHAR( 50 ),
category_id INT
);
INSERT INTO Product ( id,
name,
category_id )
VALUES ( 101, 'couch', 1 ),
( 102, 'chair', 1 ),
( 103, 'drum', 2 ),
( 104, 'flute', 2 ),
( 105, 'pot', 3 ),
( 106, 'pan', 3 ),
( 107, 'kitchen sink', 3 ),
( 108, 'unicorn saddle', 4 ),
( 109, 'unicorn shoes', 4 ),
( 110, 'horse shampoo', 4 );
CREATE TABLE Activity
(
id INT,
product_id INT,
views INT
);
INSERT INTO Activity ( id,
product_id,
views )
VALUES ( 1, 101, 1000 ),
( 2, 102, 2000 ),
( 3, 103, 3000 ),
( 4, 104, 4000 ),
( 5, 105, 5000 ),
( 6, 106, 6000 ),
( 7, 107, 7000 ),
( 8, 108, 8000 ),
( 9, 109, 9000 ),
( 10, 110, 10000 );
CREATE TABLE Sales
(
id INT,
product_id INT,
vendor_id INT,
amount INT
);
INSERT INTO Sales ( id,
product_id,
vendor_id,
amount )
VALUES ( 1, 101, 1, 1000 ),
( 2, 102, 1, 900 ),
( 3, 103, 1, 2000 ),
( 4, 105, 1, 3000 ),
( 5, 107, 1, 5000 ),
( 6, 101, 2, 600 ),
( 7, 103, 2, 7000 ),
( 8, 105, 2, 8000 ),
( 9, 107, 2, 1000 ),
( 10, 108, 1, 500 ),
( 11, 109, 1, 600 ),
( 12, 108, 2, 400 ),
( 13, 109, 2, 500 );
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://dev.mysql.com/doc/refman/5.7/en/case.html (on MySQL's CASE
statement)
https://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#function_coalesce (on MySQL's COALESCE()
function)
https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count (on MySQL's 'COUNT()` aggregate function)
https://www.w3schools.com/sql/sql_join.asp (on various types of horizontal JOIN
's in SQL)
https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_round (on MySQL's ROUND()
function)
https://dev.mysql.com/doc/refman/5.7/en/select.html (on MySQL's SELECT
statement)
https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_sum (on MySQL's SUM()
aggregate function)
Appendix
If both of my Assumptions are incorrect, then please try the following code...
SELECT Category.name AS category,
COUNT( * ) AS count_Product,
COALESCE( SUM( views ), '' ) AS sum_views,
COALESCE( ROUND( SUM( sumAmount ) / SUM( countAmounts ), '' ) ) AS average_sales,
COALESCE( SUM( whereGreater ), '' ) AS 'sum_views_where_sales_=>_1000',
COALESCE( SUM( whereLesser ), '' ) AS 'sum_views_sales_<_1000'
FROM Category
LEFT JOIN Product ON Category.id = Product.category_id
LEFT JOIN Activity ON Product.id = Activity.product_id
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) AS sumAmount,
COUNT( * ) AS countAmounts
FROM Sales
GROUP BY product_id
) sumCountAmountFinder ON Product.id = sumCountAmountFinder.product_id
LEFT JOIN ( SELECT Activity.product_id AS product_id,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) >= 1000 THEN views ELSE 0 END ) AS whereGreater,
SUM( CASE WHEN COALESCE( meanAmount, 0 ) < 1000 THEN views ELSE 0 END ) AS whereLesser
FROM Activity
LEFT JOIN ( SELECT product_id AS product_id,
SUM( amount ) / COUNT( * ) AS meanAmount
FROM Sales
GROUP BY product_id
) AS meanAmountFinder ON Activity.product_id = meanAmountFinder.product_id
GROUP BY Activity.product_id
) sumWhereFinder ON Product.id = sumWhereFinder.product_id
GROUP BY Category.name;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With