Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join gives unexpected (but pleasant) results

I have three tables I'm pulling from currently. reports, berries and melons. I set up my query like this, and it gets me exactly what I want.

SELECT 
   rpt.*, 
   ber.shipper, ber.po, ber.commodity, ber.label

FROM reports rpt

LEFT JOIN berries ber ON rpt.inspection_number = ber.report_key
LEFT JOIN melons mel ON rpt.inspection_number = mel.report_key

WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC

I'm getting my expected return which is

key | role | region   | inspection_type | inspection_number | shipper   | po    | commodity     | label
3   | NULL | Seattle  | melons          | 5555              | Shipper1  | PO2   | Commodity2    | Label2
2   | NULL | Seattle  | berries         | 1023              | Shipper1  | PO1   | Commodity1    | Label1

If though I remove LEFT JOIN melons mel ON rpt.inspection_number = mel.report_key from my statement I get the exact same thing.... I never mentioned melons??

If I revise and use JOIN instead for berries

SELECT 
   rpt.*, 
   ber.shipper, ber.po, ber.commodity, ber.label

FROM reports rpt

JOIN berries ber ON rpt.inspection_number = ber.report_key

WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC

It produces what I expected it should!

key | role | region   | inspection_type | inspection_number | shipper   | po    | commodity     | label
2   | NULL | Seattle  | berries         | 1023              | Shipper1  | PO1   | Commodity1    | Label1

But trying to revise my SQL statement like so....

SELECT 
   rpt.*, 
   ber.shipper, ber.po, ber.commodity, ber.label
   mel.shipper, mel.po, mel.commodity, mel.label

FROM reports rpt

JOIN berries ber ON rpt.inspection_number = ber.report_key
JOIN melons mel ON rpt.inspection_number = mel.report_key

WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC

Nets me....

MySQL returned an empty result set (i.e. zero rows). ( Query took 0.0011 sec ) Gives me the big middle finger. What the hell? Can someone explain what I'm evidently doing wrong, and how to fix it?

like image 217
ehime Avatar asked Feb 21 '23 10:02

ehime


2 Answers

It's not so complex. Your first query, you're joining against mel but never doing anything with it, so you're only getting ber's data. Your last query is closer, but because you're inner joining against berries and melons and you don't have any reports that are both, you get no results. But the answer is closer to what you're doing in the second query, and I think what you want is this:

SELECT 
   rpt.*, 
   COALESCE(ber.shipper, mel.shipper) AS shipper,
   COALESCE(ber.po, mel.po) AS po, 
   COALESCE(ber.commodity, mel.commodity) AS commodity,
   COALESCE(ber.label, mel.label) AS label
FROM reports rpt
LEFT JOIN berries ber ON rpt.inspection_number = ber.report_key
LEFT JOIN melons mel ON rpt.inspection_number = mel.report_key
WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC

This query says, give me rows where there's a join in berries or melons, but for the columns they have in common, give me whichever one exists. We're taking ber first for no particular reason.

Assuming these two tables are mutually exclusive, I think this does what you want.

Edit: building on what @MarcusAdams points out below, this could be rewritten to use a UNION if there are an obnoxious number of fruit tables:

SELECT report_key, shipper, po, commodity, label FROM berries
UNION
SELECT report_key, shipper, po, commodity, label FROM melons
UNION
SELECT report_key, shipper, po, commodity, label FROM ...
...

This query will give you something handy you can use as a subquery (or view) later on. You can also hard-code an origin name like so:

SELECT report_key, shipper, po, commodity, label, 'berries' AS type FROM berries
UNION
SELECT report_key, shipper, po, commodity, label, 'melons' FROM melons
UNION
SELECT report_key, shipper, po, commodity, label, '...' FROM ...
...

Then to use this in your original query, you would embed it like so:

SELECT *
FROM reports rpt,
JOIN (SELECT report_key, shipper, po, commodity, label, 'berries' AS type FROM berries
      UNION
      SELECT report_key, shipper, po, commodity, label, 'melons' FROM melons
      UNION
      SELECT report_key, shipper, po, commodity, label, '...' FROM ...
      ...) fruits ON rpt.inspection_number = fruits.report_key
WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC
like image 84
Daniel Lyons Avatar answered Feb 23 '23 23:02

Daniel Lyons


The below query that Daniel Lyons provided works great, but I'd like to discuss it a little bit, and strictly for academic purposes, offer another solution, which is likely to be more optimized.

Here's Daniel's query:

SELECT 
   rpt.*, 
   COALESCE(ber.shipper, mel.shipper) AS shipper,
   COALESCE(ber.po, mel.po) AS po, 
   COALESCE(ber.commodity, mel.commodity) AS commodity,
   COALESCE(ber.label, mel.label) AS label
FROM reports rpt
LEFT JOIN berries ber ON rpt.inspection_number = ber.report_key
LEFT JOIN melons mel ON rpt.inspection_number = mel.report_key
WHERE rpt.status='1'  OR rpt.status='0'
ORDER BY rpt.inspection_number DESC

This query works great, and for only two fruit, it's fairly optimized. Still since the reports are mutually exclusive, the query is attempting an extra join than is necessary. For example, if a report record is already joined to a berry record, we know it's not going to be joined to a melon record, but MySQL doesn't know that. Instead, MySQL will do another lookup to attempt a join to the melon table, even though a corresponding record won't be found.

With only two joins, half of the join attempts are wasted. However, with three fruits, two-thirds of the joins attempts are wasted, with four fruits, three-fourths of the join attempts are wasted, and so on.

To avoid the extra join attempts, we can reverse the order of the joins, like this:

(SELECT rpt.*, ber.shipper, ber.po, ber.commondity, ber.label
FROM berries ber
JOIN reports rpt
  ON rpt.inspection_number = ber.report_key
WHERE rpt.status = '1' OR rpt.status = '0')
UNION ALL
(SELECT rpt.*, mel.shipper, mel.po, mel.commondity, mel.label
FROM melons mel
JOIN reports rpt
  ON rpt.inspection_number = mel.report_key
WHERE rpt.status = '1' OR rpt.status = '0')
ORDER BY inspection_number DESC

Here, we're starting from the other direction (with the fruit), and joining back to the reports. This allows us to make just one join per report.

Notice that we're now using INNER JOIN instead of LEFT JOIN for each fruit, and we're using UNION ALL to join the results for each fruit into one larger result set.

For further optimization, sometimes MySQL doesn't recognize two constants like 1 and 0 as a range, especially if it's not an integer field. Range lookups are faster than two individual lookups, so to hint to MySQL that the rpt.status of 1 and 0 are a range, use BETWEEN instead of OR, assuming you have a covering index of (rpt.inspection_number, rpt.status).

like image 34
Marcus Adams Avatar answered Feb 23 '23 23:02

Marcus Adams