Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: Matching records that have x consecutive dates available between two dates

Background / Application

I have a MySQL database containing a table of rentable properties and a table of bookings for these properties. There is also a search feature for finding available properties between two provided dates. When searching, the user can enter the start date, the amount of days they wish to stay, and a date flexibility of up to +/- 7 days. A booking can start on the same day as another booking ends (party 1 leaves in the morning, party 2 arrives in the evening).

I am having difficulty implementing the flexibility feature efficiently.

Schema

CREATE TABLE IF NOT EXISTS `property` (
    `id` bigint(20) NOT NULL AUTO_INCREMENT,
    `name` varchar(60) COLLATE utf8_unicode_ci DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `property_booking` (
    `id` bigint(20) NOT NULL AUTO_INCREMENT,
    `property_id` bigint(20) DEFAULT NULL,
    `name` varchar(60) COLLATE utf8_unicode_ci DEFAULT NULL,
    `date_start` date DEFAULT NULL,
    `date_end` date DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Sample Data

INSERT INTO `property` (`name`) 
VALUES ('Property 1'), ('Property 2'), ('Property 3');

INSERT INTO `property_booking` (`property_id`,`name`,`date_start`,`date_end`) 
VALUES (1, 'Steve', '2011-03-01', '2011-03-08'), 
(2, 'Bob', '2011-03-13', '2011-03-20'), 
(3, 'Jim', '2011-03-16', '2011-03-23');

Sample Scenario

The user selects that they want to start their stay on 2011-03-10, they want to stay for 7 days, and they have a flexibility of +/- 2 days. I have compiled an image that visualises the data and parameters below. (Red: Booking 1, Green: Booking 2, Stripes: Booking 3, Blue: Date range (2011-03-10, + 7 days and +/- 2 days flexibility))

Expected Result

Property 1 (Bookings available throughout date range)
Property 3 (Bookings available starting on 2011-03-08 or 2011-03-09)

Current Method

My current query checks for overlap for all 7 day date ranges within the total searchable date range, like this:

SELECT p.`id`, p.`name` 
FROM `property` p 
WHERE (NOT (EXISTS (SELECT p2.`name` FROM `property_booking` p2 WHERE (p2.`property_id` = p.`id` AND '2011-03-10' < DATE_SUB(p2.`date_end`, INTERVAL 1 DAY) AND '2011-03-17' > DATE_ADD(p2.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p3.`name` FROM `property_booking` p3 WHERE (p3.`property_id` = p.`id` AND '2011-03-11' < DATE_SUB(p3.`date_end`, INTERVAL 1 DAY) AND '2011-03-18' > DATE_ADD(p3.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p4.`name` FROM `property_booking` p4 WHERE (p4.`property_id` = p.`id` AND '2011-03-09' < DATE_SUB(p4.`date_end`, INTERVAL 1 DAY) AND '2011-03-16' > DATE_ADD(p4.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p5.`name` FROM `property_booking` p5 WHERE (p5.`property_id` = p.`id` AND '2011-03-12' < DATE_SUB(p5.`date_end`, INTERVAL 1 DAY) AND '2011-03-19' > DATE_ADD(p5.`date_start`, INTERVAL 1 DAY)))))
OR (NOT (EXISTS (SELECT p6.`name` FROM `property_booking` p6 WHERE (p6.`property_id` = p.`id` AND '2011-03-08' < DATE_SUB(p6.`date_end`, INTERVAL 1 DAY) AND '2011-03-15' > DATE_ADD(p6.`date_start`, INTERVAL 1 DAY)))));

On the sample dataset, it's reasonably quick, but on much larger datasets it's going to get pretty sluggish, even more so when you build the full +/- 7 day flexibility.

Does anyone have any suggestions as to how this query could be better written?

like image 639
Kris Avatar asked Feb 17 '11 16:02

Kris


1 Answers

Ok, here's a tricky answer for a tricky question...

SELECT * FROM property AS p
LEFT JOIN  
(
  SELECT property_id, DATEDIFF(MAX(date_end),20110308) AS startblock, 
      DATEDIFF(20110319,MIN(date_start))-1 AS endblock
  FROM property_booking AS pb
  WHERE date_start < 20110319 || date_end >= 20110308 
  GROUP BY property_id
  HAVING LEAST(startblock,endblock) > 4
) AS p2 ON p.id = p2.property_id 
WHERE p2.property_id IS NULL;

The subquery selects all the properties that are not eligible. The LEFT JOIN with IS NULL basically works out the exclusion (negation on the ineligible properties)

  • 20110308 is the desired start date -2 days ( because +/-2 day flexibility)
  • 20110319 is the desired end date +2 days
  • The number 4 in the HAVING LEAST(startblock,endblock) > 4 in twice your +/- number (2*2)

It took me a while to work it out (but your question was interesting and I had time on my hand)

I've tested it with edge cases and it worked for all the test cases I've thrown at it...). The logic behind it is a bit odd but a good old pen and paper helped me work it out!

Edit

Unfortunately I realized that this will work for most cases but not all... (2 single day bookings at the very beginning and end of the lookup period makes a property unavailable even though it should be available).

The problem here is that you have to look up information that's not 'present' in the DB and reconstruct it from what data you have. Check out my comment on your question to see a better way to deal with the problem

like image 160
Damp Avatar answered Oct 19 '22 23:10

Damp