Context:
The best example is AirBnB. Let's say I have 5 apartments. Each apartment has a calendar that represents it's availability. When a vacationer travels to my city and searches for apartments using a given start date and end date, if that period of time shows up as available on the calendar for any of my apartments, I want those apartments to be shown in search results for the vacationer.
One bit at a time:
Obviously there's a lot in the above. The scope of this question is how I should set up database for the list of apartments that includes their availability. Before building a database, I spent some time manually coordinating in Excel just to get a clearer picture in my head of what everything should look like. In the Excel, what I found worked to be column headers for table are:
Calendar right now is what I'm having trouble with. Literally in my Excel, the columns are just dates going on to eternity. Whenever a vacationer submits a request, I find all the apartments for which each date cell is empty (e.g., available). Then I send the vacationer these apartments. When s/he makes a booking, I go back to the Excel and mark unavailable in each date cell for the specific apartment chosen.
I want to get more opinions... is this the right way I should imagine my database in PostGreSQL? And if so... can I just make a migration that looks like below?
class CreateApartments < ActiveRecord::Migration
def change
create_table :apartments do |t|
t.string :apt_name
t.integer :apt_owner
t.text :apt_description
Date.today..Date.new(2034, 12, 31)).each do |date|
t.date :date
end
t.timestamps
end
end
end
You should not store availability but the opposite (apartment is booked for a specific date). Without any deeper analysis I would do something as simple as:
owner
owner_id
owner_name
apartment
apartment_id
apartment_name
apartment_description
owner_id
customer
customer_id
customer_name
booking
booking_id
customer_id
apartment_id
booking_start
booking_end
In case when one can book disjoint days:
booking
booking_id
customer_id
apartment_id
booking_calendar
booking_id
booking_date
In any case you will be able to return list of available apartments quite easy.
select
*
from
apartments a
where not exists
(select
1
from
bookings b
where
a.apartment_id = b.apartment_id
and (
<<required_start>> between booking_start and booking_end
or
<<required_end>> between booking_start and booking_end
)
I think you will find PostgreSQL's support for range types and associated overlapping tests useful here.
You can index on ranges and even use a constraint to prevent an apartment being booked twice on the same day.
With ranges you can store bookings or availability as you prefer.
The question is in PostgreSQL but I wanted to share what I've done in MySQL and its performance results. I hope this applies to other databases.
I have created a listings
table and listing_availabilities
which stores the available range of time the listing is available. It doesn't matter if I create availabilities or 'unavailabilities' because the complement of availabilities is unavailabilities, and thus, the only thing it changes is not the size of dataset but the query conditions.
CREATE TABLE `listings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `listing_availabilities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`listing_id` int(11) NOT NULL,
`start_time` int(11) NOT NULL,
`end_time` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `start_time` (`start_time`),
KEY `end_time` (`end_time`),
KEY `listing_id` (`listing_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
There is a concern that if you are going to make that listing available for one year, then you would need to implement 365 rows of data. Suppose you have 1 million listings, then you will have minimum of 365 million rows of listing availabilities. Yes, that's correct. It will either have that many rows or have one row with a large packet that is difficult to filter from. So, the only way to know is to actually test.
I've added 1 million rows to listings
and 365 million rows in listing_availabilities
, so one row has 365 days of availability ranges.
Here are some practical queries and its performance.
Get all listing's availabilities for specific listing
> SELECT * FROM `listing_availabilities` WHERE `listing_id` = 716384;
> 365 rows in set (0.01 sec)
See whether specific time is available for this listing. 1 if available/0 if not
> SELECT count(*) as exists FROM `listing_availabilities`
> WHERE `listing_id` = 1234
> AND 1481527584 BETWEEN `start_time` AND `end_time`
> LIMIT 1;
> 1 row in set (0.00 sec)
Get first 50 listings joining with first available time slot
> SELECT *
> FROM listings AS l
> LEFT JOIN listing_availabilities a
> ON a.listing_id = l.id
> AND a.start_time = (SELECT start_time FROM listing_availabilities WHERE l.id = listing_id ORDER BY start_time ASC LIMIT 1)
> LIMIT 50;
> 50 rows in set (0.05 sec)
Get first 50 listings where my specific timestamp is available (assuming conditions meet)
> SELECT *
> FROM listings AS l
> LEFT JOIN listing_availabilities a
> ON a.listing_id = l.id
> AND a.start_time =
> (SELECT start_time FROM listing_availabilities WHERE l.id = listing_id AND start_time < 1481536932 AND end_time > 1481536932 LIMIT 1)
> WHERE a.start_time IS NOT NULL
> LIMIT 50;
> 50 rows in set (0.05 sec)
Get first 50 listings where my specific timestamp is not available (when no availability time is met)
> SELECT *
> FROM `listings` AS l
> LEFT JOIN `listing_availabilities` a
> ON a.listing_id = l.id
> AND a.start_time =
> (SELECT `start_time` FROM `listing_availabilities` WHERE l.id = listing_id AND `start_time` < 1481530494 AND `end_time` > 1481530494 LIMIT 1)
> WHERE a.start_time IS NOT NULL
> LIMIT 50;
> Empty set (3.00 sec)
This one took much longer and, with the help of EXPLAIN EXTENDED, it seems to scan the whole table when using multiple column inequality in subqueries. I'm still not 100% sure on why this is the case...someone could enlighten me here.
The alternative to this is to left join all availabilities to each listings and then filter from that set:
> SELECT l.id, a.start_time, a.end_time
> FROM listings as l
> LEFT JOIN listing_availabilities a
> ON l.id = a.listing_id
> WHERE 1481530494
> BETWEEN a.start_time
> AND a.end_time limit 50;
> Empty set (0.01 sec)
Automatically insert listing_availabilities for 365th day
> INSERT INTO `listing_availabilities`
> (listing_id, start_time, end_time)
> VALUES
> (8, 1481689555, 1481689556) ...
> Query OK, 500 rows affected (0.01 sec)
> Records: 500 Duplicates: 0 Warnings: 0
Remove old availabilities
> DELETE FROM `listing_availabilities`
> WHERE `end_time` < 1481671237
> LIMIT 100000;
> Query OK, 100000 rows affected (10.43 sec)
This seems to take a while and locks the DB so I've decided to do it in smaller chunks as such:
> DELETE FROM `listing_availabilities`
> WHERE end_time < 1481671237
> LIMIT 500;
> Query OK, 500 rows affected (0.09 sec)
and do this until affected rows become zero.
Obviously, you need a script to remove old availabilities and add new availabilities for months ahead.
This seems to work for my application and it should satisfy the requirements the OP wants. I hope this helps other database schemas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With