Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GROUP BY with aggregate and an INNER JOIN

Tags:

sql

mysql

I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn't work the way I want it:

SELECT *, MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY id_edge

Here is what I get:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|2  | 23   | 2    | 23  | 35  | 1       | 9                      |
|4  | 24   | 5    | 24  | 46  | 1       | 9                      |
------------------------------------------------------------------

and this is what I would want:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|3  | 23   | 9    | 23  | 35  | 1       | 9                      |
|5  | 24   | 9    | 24  | 46  | 1       | 9                      |
------------------------------------------------------------------

But let me elaborate a bit...

I have a graph, let's say as such:

    node1
      |
    node2
   /     \
node3    node4
  |       |
node5    node6

Therefore I have a table I call tbl_edges like this:

| id  | nodeA | node B |
------------------------
| 12  |   1   |    2   |
| 23  |   2   |    3   |
| 24  |   2   |    4   |
| 35  |   3   |    5   |
| 46  |   4   |    6   |
------------------------

Now each edge has "stop_overs" at a certain distance (to nodeA). Therefore I have a table tbl_stopover like this:

| id  | edge  |  dist  |
------------------------
|  1  |  12   |    5   |
|  2  |  23   |    2   |
|  3  |  23   |    9   |
|  4  |  24   |    5   |
|  5  |  24   |    9   |
|  6  |  35   |    5   |
|  7  |  46   |    5   |
------------------------

Why this query?
Let's assume I want to calculate the distance between the stop_overs. Within one edge that is no problem. Across edges it gets more difficult. But if I have two edges that are connected and there is no other connection I can also calculate the distance. Here an example assuming all edges have a length of 10. :

edge23 has a stop_over(id=3) at dist=9, edge35 has a stop_over(id=6) at dist=5. Therefore the distance between these two stop_overs is:

dist = (length - dist_id3) + dist_id5 = (10-9) + 5

I am not sure if I made my self clear. If this is not understandable, feel free to ask question and I will do my best to make this more understandable.

like image 604
AudioDroid Avatar asked Jul 07 '11 13:07

AudioDroid


1 Answers

MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY or an aggregate function like MAX. When you do this, you get random (as you said) results for the remaining fields.

In your query you are doing this twice - once in your inner query (id2 is not part of a GROUP BY or aggregate) and once in the outer.

Prepare for random results!

To fix it, try something like this:

SELECT tbl_stopover.id,
       tbl_stopover.dist,
       tbl_conn.id1,
       tbl_conn.id2,
       tbl_conn.numConn,
       MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id, edges2.id
  HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn

The major changes are the explicit field list (note that I removed the id_edge since you are joining on id1 and already have that field), and addition of additional fields to both the inner and outer GROUP BY clauses.

If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.

like image 167
JNK Avatar answered Oct 02 '22 02:10

JNK