Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL SELECT data between two related tables NOT within a certain date range

I have 2 many to many tables, and a table to join them.

Officer

  • id
  • name

Report

  • id
  • performanceDate
  • title

report_officer

  • officer_id
  • report_id

I want to select all officers that haven't ever been associated with a report or who has not been associated with a report within a certain timeframe.

So far I have tried the following (below doesn't work for me!):

SELECT * 
  FROM Officer 
       LEFT JOIN report_officer 
            ON Officer.id = report_officer.officer_id 
       LEFT JOIN Report 
            ON Report.id = report_officer.report_id
 WHERE (performanceDate IS NULL 
        OR performanceDate < "2014-03-23 00:00:00" 
        OR performanceDate > "2014-04-01 00:00:00"
        )

My left join query works only when the officer has ONLY been associated with a report within a certain timeframe, but fails once they have multiple reports.

Result:

+------------+-----------------+
| officer_id | performanceDate |
+------------+-----------------+
|        130 | NULL            | # good

|        134 | 2014-03-02      | # bad - officer_id 134 has a performanceDate  
|        134 | 2014-03-09      | # on 2014-3-30, I do not want this in the results.
|        134 | 2014-03-16      | # 

|        135 | 2014-03-02      | # good
+------------+-----------------+

SQL Fiddle: http://sqlfiddle.com/#!2/1bf72/3 <- in the sql fiddle, please refer to the 'name' field for which columns I am looking to have returned.

Any ideas on how to make this work?

Ideally I would like to make this as simple as possible to work with my ORM. I am using doctrine and would prefer not to start using totally custom code (so if it can be done with only joins, that would be great). I though have a bad feeling I need a sub-query.

like image 304
PressingOnAlways Avatar asked Mar 21 '23 10:03

PressingOnAlways


2 Answers

SELECT Officer.*, Report.performanceDate FROM Officer
LEFT JOIN report_officer ON Officer.id = report_officer.officer_id 
LEFT JOIN Report ON Report.id = report_officer.report_id
 AND
   (performanceDate > "2014-03-23 00:00:00" AND
    performanceDate < "2014-04-01 00:00:00")
WHERE Report.id IS NULL

You only want to join rows within a specific date range, so you must move the constraint into the on clause of the join and inverse the constraint.

If you want to remove duplicates, you could try a group by:

SELECT Officer.id, MAX(Report.performanceDate) FROM Officer
LEFT JOIN report_officer ON Officer.id = report_officer.officer_id 
LEFT JOIN Report ON Report.id = report_officer.report_id
 AND
   (performanceDate > "2014-03-23 00:00:00" AND
    performanceDate < "2014-04-01 00:00:00")
WHERE Report.id IS NULL
GROUP BY Officer.id

but you have to decide on which date you want to get, if there are multiple performance dates in your requested date range (or you could use GROUP_CONCAT to gather all dates).

Update

Actually I am relatively sure, that what you want to achieve is not possible with LEFT JOINs at all...

What always works are subquery solutions:

SELECT Officer.id as OfficerID, Officer.name,
Report.id as ReportID,
Report.performanceDate

FROM Officer
LEFT JOIN report_officer
  ON Officer.id = report_officer.officer_id 
LEFT JOIN Report
  ON Report.id = report_officer.report_id

WHERE Report.id IS NULL 
OR NOT EXISTS (
    SELECT * FROM report_officer
    INNER JOIN Report ON report_id = Report.id
    WHERE officer_id = Officer.id AND
      performanceDate > "2014-03-23 00:00:00" 
      AND performanceDate < "2014-04-01 00:00:00"
)

but these are not that performant... This one looks if there are reports which should prohibit outputting the row.

like image 125
Ulrich Thomas Gabor Avatar answered Mar 23 '23 05:03

Ulrich Thomas Gabor


I want to select all officers that haven't ever been associated with a report or who has not been associated with a report within a certain timeframe.

Your two conditions are redundant: if an officer has not been associated, ever, then he also cannot have been associated in any timeframe, and will be selected by the second condition. If he has reports in the timeframe, then he isn't selected because of the second condition, but he also has at least one report and cannot satisfy the first.

So what you want is, "an officer with zero reports in the timeframe".

To do so, just invert the conditions: first get those reports within the selected timeframe (i.e. those officers you don't want); then LEFT JOIN with Officer, requiring that the join yields null. This will get you the other officers, those that have no report in the selected time frame (or maybe no report at all).

In this case, though, you cannot have a report date, since you have no reports (this is more apparent for those officers with no report at all):

SELECT
   Officer.id as OfficerID,
   Officer.name,
   MAX(Report.id) as ReportID,
   MAX(performanceDate) AS performanceDate
FROM Officer
LEFT JOIN report_officer ON (Officer.id = report_officer.officer_id)
LEFT JOIN Report ON (Report.id = report_officer.report_id 
   AND performanceDate BETWEEN 20140323 AND 20140401)
GROUP BY Officer.id, Officer.name
HAVING ReportID IS NULL;

I don't know about Doctrine and HAVING. If you cannot use a HAVING clause, you may try to emulate it by running this, which ought to be quite standard:

SELECT
   Officer.id as OfficerID,
   Officer.name,
   COUNT(Report.id) as reports
FROM Officer
LEFT JOIN report_officer ON (Officer.id = report_officer.officer_id)
LEFT JOIN Report ON (Report.id = report_officer.report_id 
   AND performanceDate BETWEEN 20140323000000 AND 20140401235959)
GROUP BY Officer.id, Officer.name;

and then apply a filter where reports is equal to 0, i.e., no reports in the given timeframe. You can add MAX(performanceDate) AS performanceDate, MAX(Report.id) AS ReportID to get the date of the (e.g. latest) report for those officers who do have at least one outside the timeframe. This might be not the report you'd want.

Care must be taken when specifying date range, since YYYYMMDD equals normally YYYYMMDD000000, which could lead to the equivalent of a semi-inclusive range. Otherwise, replace BETWEEN with performanceDate >= '2014-03-23 00:00:00' AND performanceDate <= '2014-04-01 23:59:59'.

like image 31
LSerni Avatar answered Mar 23 '23 04:03

LSerni