Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL SUM of one column, DISTINCT of ID column

Tags:

mysql

I'm trying to create a summary report of our orders but having trouble extracting all my required data in a single query.

The data I'd like to extract:

  • subtotal - SUM of all sale prices
  • delivery total - SUM of all orders deliveryTotal
  • orders - COUNT of DISTINCT orderIds
  • quantity - SUM of all quantity ordered

Orders table (simplified for this example)

| orderId | deliveryTotal | total |
|---------|---------------|-------|
| 1       | 5             | 15    |
| 2       | 5             | 15    |
| 3       | 7.50          | 27.50 |

Order items table

| orderItemId | orderId | productId | salePrice | quantity |
|-------------|---------|-----------|-----------|----------|
| 1           | 1       | 1         | 10        | 1        |
| 2           | 2       | 1         | 10        | 1        |
| 3           | 3       | 1         | 10        | 1        |
| 4           | 3       | 2         | 10        | 1        |

My current query for extracting this data is

SELECT
    SUM(i.salePrice * i.quantity) as subtotal,
    SUM(DISTINCT o.deliveryTotal) as deliveryTotal,
    COUNT(DISTINCT o.orderId) as orders,
    SUM(i.quantity) as quantity
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId

Which results in a correct subtotal, order count and quantity sum. But delivery total is returned as 12.50 when I'm after 17.50. If I do SUM(o.deliveryTotal) it will return 25.

EDIT: Desired results

| subtotal | deliveryTotal | orders | quantity |
|----------|---------------|--------|----------|
| 40.00    | 17.50         | 3      | 4        |
like image 646
Nick Avatar asked Nov 30 '22 23:11

Nick


2 Answers

https://tiaashish.wordpress.com/2014/01/31/mysql-sum-for-distinct-rows-with-left-join/

Here is a blog post that shows exactly what I was looking for. Maybe this can help others too.

The formula is something like this:

SUM(o.deliveryTotal) * COUNT(DISTINCT o.orderId) / COUNT(*)

like image 147
GarryOne Avatar answered Dec 06 '22 17:12

GarryOne


Because of the join, the SUM(DISTINCT deliveryTotal) aggregate is being applied to a rowset including the values 5, 5, 7.5, 7.5 (distinct 5 + 7.5 = 12.5).

The rows your SUM() acted on become more apparent if you simply do

SELECT o.*
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId

Instead you are asking for the SUM() of all the values in deliveryTotal, irrespective of their position in the join with orderItems. That means you need to apply the aggregate at a different level.

Since you are not intending to add a GROUP BY later, the easiest way to do that is to use a subselect whose purpose is only to get the SUM() across the whole table.

SELECT
    SUM(i.salePrice * i.quantity) as subtotal,
    -- deliveryTotal sum as a subselect
    (SELECT SUM(deliveryTotal) FROM orders) as deliveryTotal,
    COUNT(DISTINCT o.orderId) as orders,
    SUM(i.quantity) as quantity
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId

Subselects are usually discouraged but there won't be a significant performance penalty for the subselect, none different from the alternative methods of using a join for it. The calculation has to be done on a separate aggregate from the existing join no matter what. Other methods would place a subquery CROSS JOIN in the FROM clause, which performs the same thing we placed here in the subselect. Performance would be the same.

like image 29
Michael Berkowski Avatar answered Dec 06 '22 18:12

Michael Berkowski