Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL joining tables group by sum issue

I have problem with joining tables, here are example tables:

Table A: ( 30 rows)

╔════╦════════════╦═════════════╗
║ ID ║ Name       ║ Description ║
╠════╬════════════╬═════════════╣
║ 1  ║ Type       ║ Unicode Art ║
║ 2  ║ Header     ║ Spreadsheet ║
║ 3  ║ Auto Align ║ Off         ║
╚════╩════════════╩═════════════╝

Table B: ( 100 rows )

╔════╦════════════╦═════════════╦═════════╗
║ ID ║ Name       ║ Description ║ TableA  ║
╠════╬════════════╬═════════════╬═════════╣
║ 1  ║ Type       ║ Unicode Art ║ 1       ║
║ 2  ║ Header     ║ Spreadsheet ║ 1       ║
║ 3  ║ Auto Align ║ Off         ║ 2       ║
╚════╩════════════╩═════════════╩═════════╝

Table C: ( 8000 rows )

╔════╦════════════╦═════════════╦═════════╗
║ ID ║ Article    ║ Text        ║ TableB  ║
╠════╬════════════╬═════════════╬═════════╣
║ 1  ║ Type       ║ Unicode Art ║ 1       ║
║ 2  ║ Header     ║ Spreadsheet ║ 1       ║
║ 3  ║ Auto Align ║ Off         ║ 2       ║
╚════╩════════════╩═════════════╩═════════╝

Table D: ( 100 000 rows and counting )

╔════╦═══════════╦════════════╦═════════════╦═════════╗
║ ID ║ Date      ║ Clicks     ║ Impressions ║ TableC  ║
╠════╬═══════════╬════════════╬═════════════╬═════════╣
║ 1  ║ 20120814  ║ 10         ║ 3           ║ 1       ║
║ 2  ║ 20120815  ║ 13         ║ 5           ║ 1       ║
║ 3  ║ 20120816  ║ 15         ║ 10          ║ 2       ║
╚════╩═══════════╩════════════╩═════════════╩═════════╝

Table E: ( 200 000 rows and counting )

╔════╦═══════════╦════════════╦═══════════╦═════════╗
║ ID ║ Date      ║ Views      ║ Visitors  ║ TableC  ║
╠════╬═══════════╬════════════╬═══════════╬═════════╣
║ 1  ║ 20120814  ║ 10         ║ 3         ║ 1       ║
║ 2  ║ 20120815  ║ 13         ║ 5         ║ 1       ║
║ 3  ║ 20120816  ║ 15         ║ 10        ║ 2       ║
║ 4  ║ 20120817  ║ 8          ║ 7         ║ 2       ║
║ 5  ║ 20120818  ║ 9          ║ 4         ║ 2       ║
╚════╩═══════════╩════════════╩═══════════╩═════════╝

I query this tables with single sql statement:

SELECT
       A.name,
       A.Description,
       SUM(D.clicks),
       SUM(D.Impressions),
       SUM(E.Views),
       SUM(E.Visitors)
FROM 
       A 
       LEFT JOIN B
         ON A.ID=B.TableA
       LEFT JOIN C
         ON B.ID=C.TableB
       LEFT JOIN D
         ON C.ID=D.TableC
       LEFT JOIN E
         ON C.ID=E.TableC
GROUP BY
       A.ID

Problem is that the query returns invalid SUM for Table D and Table E
However if query Table D and Table E in invidual queries I get right values:

SELECT
       A.name,
       A.Description,
       SUM(D.clicks),
       SUM(D.Impressions)
FROM 
       A 
       LEFT JOIN B
         ON A.ID=B.TableA
       LEFT JOIN C
         ON B.ID=C.TableB
       LEFT JOIN D
         ON C.ID=D.TableC
GROUP BY
       A.ID

EDIT 1:
I have tried RIGHT JOIN, JOIN, LEFT OUTER JOIN none of them worked,
ofcourse it's possible that I used those in wrong places.
But in the first statement where I got "all included" values are multiplied
many thousand times higher than they really are.

like image 500
Jompper Avatar asked Aug 19 '12 11:08

Jompper


2 Answers

You need to flatten both D and E table. Then I suppose A and B are mere lookups for C, so there's no need to do GROUP BY on A: http://www.sqlfiddle.com/#!2/fccf1/8

I removed the noise(A and B), as I can't see(yet) how A and B are related to summarizing the information for C

Try this:

SELECT
       C.Article,
       C.Text,

       COALESCE(D.ClicksSum,0) AS ClicksSum,
       COALESCE(D.ImpressionsSum,0) AS ImpressionsSum,

       COALESCE(E.ViewsSum,0) AS ViewsSum,
       COALESCE(E.VisitorsSum,0) AS VisitorsSum
FROM 
       C

       LEFT JOIN 
       (
         SELECT TableC, SUM(Clicks) AS ClicksSum, SUM(Impressions) AS ImpressionsSum
         FROM D
         GROUP BY TableC
       ) D ON C.ID=D.TableC

       LEFT JOIN 
       (
         SELECT TableC, SUM(Views) AS ViewsSum, SUM(Visitors) AS VisitorsSum
         FROM E
         GROUP BY TableC       
       ) E ON C.ID=E.TableC

Output:

|    ARTICLE |        TEXT | CLICKSSUM | IMPRESSIONSSUM | VIEWSSUM | VISITORSSUM |
----------------------------------------------------------------------------------
|       Type | Unicode Art |        23 |              8 |       23 |           8 |
|     Header | Spreadsheet |        15 |             10 |       32 |          21 |
| Auto Align |         Off |         0 |              0 |        0 |           0 |

Note that I didn't type those schema in my sqlfiddle post manually, I uses sqlfiddle's Text to DDL

I love http://sqlfiddle.com, its Text to DDL can even parses out the data from your ASCII art ツ


Upon seeing your clearer objective(from your comment), this might be it: http://www.sqlfiddle.com/#!2/fccf1/13

SELECT
       A.Name, A.Description,

       COALESCE(SUM(D.ClicksSum),0) AS ClicksSum,
       COALESCE(SUM(D.ImpressionsSum),0) AS ImpressionsSum,

       COALESCE(SUM(E.ViewsSum),0) AS ViewsSum,
       COALESCE(SUM(E.VisitorsSum),0) AS VisitorsSum
FROM 
       C

       LEFT JOIN 
       (
         SELECT TableC, SUM(Clicks) AS ClicksSum, SUM(Impressions) AS ImpressionsSum
         FROM D
         GROUP BY TableC
       ) D ON C.ID=D.TableC

       LEFT JOIN 
       (
         SELECT TableC, SUM(Views) AS ViewsSum, SUM(Visitors) AS VisitorsSum
         FROM E
         GROUP BY TableC       
       ) E ON C.ID=E.TableC

       RIGHT JOIN B ON B.ID = C.TableB
       RIGHT JOIN A ON A.ID = B.TableA

GROUP BY A.ID

Output:

|       NAME | DESCRIPTION | CLICKSSUM | IMPRESSIONSSUM | VIEWSSUM | VISITORSSUM |
----------------------------------------------------------------------------------
|       Type | Unicode Art |        38 |             18 |       55 |          29 |
|     Header | Spreadsheet |         0 |              0 |        0 |           0 |
| Auto Align |         Off |         0 |              0 |        0 |           0 |

The above approach might still produce cartesian products, flatten the SubCategory(B) before grouping it to Category(A): http://www.sqlfiddle.com/#!2/fccf1/19

SELECT 
  A.Name, A.Description,
  COALESCE(SUM(B.ClicksSum),0) AS ClicksSum,
  COALESCE(SUM(B.ImpressionsSum),0) AS ImpressionsSum,
  COALESCE(SUM(B.ViewsSum),0) AS ViewsSum,
  COALESCE(SUM(B.VisitorsSum),0) AS VisitorsSum
FROM A
LEFT JOIN
(

  SELECT 
    B.ID, B.TableA,
    SUM(C.ClicksSum) AS ClicksSum,
    SUM(C.ImpressionsSum) AS ImpressionsSum,
    SUM(C.ViewsSum) AS ViewsSum,
    SUM(C.VisitorsSum) AS VisitorsSum
  FROM B
  LEFT JOIN
  (

    SELECT
      C.TableB,

      D.ClicksSum,
      D.ImpressionsSum,

      E.ViewsSum,
      E.VisitorsSum
    FROM 
    C

    LEFT JOIN 
    (
      SELECT TableC, SUM(Clicks) AS ClicksSum, SUM(Impressions) AS ImpressionsSum
      FROM D
      GROUP BY TableC
    ) D ON C.ID=D.TableC

    LEFT JOIN 
    (
      SELECT TableC, SUM(Views) AS ViewsSum, SUM(Visitors) AS VisitorsSum
      FROM E
      GROUP BY TableC       
    ) E ON C.ID=E.TableC

  ) C ON C.TableB = B.ID
  GROUP BY B.ID

) B ON B.TableA = A.ID
GROUP BY A.ID

Output:

|       NAME | DESCRIPTION | CLICKSSUM | IMPRESSIONSSUM | VIEWSSUM | VISITORSSUM |
----------------------------------------------------------------------------------
|       Type | Unicode Art |        38 |             18 |       55 |          29 |
|     Header | Spreadsheet |         0 |              0 |        0 |           0 |
| Auto Align |         Off |         0 |              0 |        0 |           0 |
like image 177
Michael Buen Avatar answered Nov 15 '22 19:11

Michael Buen


Both table D and table E are linked to the rest by table C. So your first query gives a cartesian product of all rows in table D times all rows in table E, and the SUM function aggregates this cartesian product. Probably you will have to group by table C as well, not only by table A.

like image 21
Alex Monthy Avatar answered Nov 15 '22 19:11

Alex Monthy