Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a database that shows a calendar of availability for a set of apartments?

Context:

The best example is AirBnB. Let's say I have 5 apartments. Each apartment has a calendar that represents it's availability. When a vacationer travels to my city and searches for apartments using a given start date and end date, if that period of time shows up as available on the calendar for any of my apartments, I want those apartments to be shown in search results for the vacationer.

One bit at a time:

Obviously there's a lot in the above. The scope of this question is how I should set up database for the list of apartments that includes their availability. Before building a database, I spent some time manually coordinating in Excel just to get a clearer picture in my head of what everything should look like. In the Excel, what I found worked to be column headers for table are:

  • apartment_name
  • owner_id
  • apartment_description
  • calendar

Calendar right now is what I'm having trouble with. Literally in my Excel, the columns are just dates going on to eternity. Whenever a vacationer submits a request, I find all the apartments for which each date cell is empty (e.g., available). Then I send the vacationer these apartments. When s/he makes a booking, I go back to the Excel and mark unavailable in each date cell for the specific apartment chosen.

I want to get more opinions... is this the right way I should imagine my database in PostGreSQL? And if so... can I just make a migration that looks like below?

class CreateApartments < ActiveRecord::Migration
  def change
    create_table :apartments do |t|
      t.string  :apt_name
      t.integer :apt_owner
      t.text    :apt_description

      Date.today..Date.new(2034, 12, 31)).each do |date|
          t.date :date
      end

      t.timestamps
    end
  end
end
like image 261
james Avatar asked Jun 05 '14 17:06

james


3 Answers

You should not store availability but the opposite (apartment is booked for a specific date). Without any deeper analysis I would do something as simple as:

owner
  owner_id
  owner_name

apartment
  apartment_id
  apartment_name
  apartment_description
  owner_id

customer
  customer_id
  customer_name

booking
  booking_id
  customer_id
  apartment_id
  booking_start
  booking_end

In case when one can book disjoint days:

booking
  booking_id
  customer_id
  apartment_id

booking_calendar
  booking_id
  booking_date

In any case you will be able to return list of available apartments quite easy.

select
 *
from
 apartments a
 where not exists 
  (select 
     1
   from 
     bookings b 
   where 
      a.apartment_id = b.apartment_id 
      and (
        <<required_start>> between booking_start and booking_end
        or
        <<required_end>> between booking_start and booking_end
        )
like image 198
Tomas Greif Avatar answered Nov 10 '22 06:11

Tomas Greif


I think you will find PostgreSQL's support for range types and associated overlapping tests useful here.

You can index on ranges and even use a constraint to prevent an apartment being booked twice on the same day.

With ranges you can store bookings or availability as you prefer.

like image 32
Richard Huxton Avatar answered Nov 10 '22 06:11

Richard Huxton


The question is in PostgreSQL but I wanted to share what I've done in MySQL and its performance results. I hope this applies to other databases.

Approach

I have created a listings table and listing_availabilities which stores the available range of time the listing is available. It doesn't matter if I create availabilities or 'unavailabilities' because the complement of availabilities is unavailabilities, and thus, the only thing it changes is not the size of dataset but the query conditions.

CREATE TABLE `listings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

CREATE TABLE `listing_availabilities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `listing_id` int(11) NOT NULL,
  `start_time` int(11) NOT NULL,
  `end_time` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `start_time` (`start_time`),
  KEY `end_time` (`end_time`),
  KEY `listing_id` (`listing_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

There is a concern that if you are going to make that listing available for one year, then you would need to implement 365 rows of data. Suppose you have 1 million listings, then you will have minimum of 365 million rows of listing availabilities. Yes, that's correct. It will either have that many rows or have one row with a large packet that is difficult to filter from. So, the only way to know is to actually test.

Test

I've added 1 million rows to listings and 365 million rows in listing_availabilities, so one row has 365 days of availability ranges.

Here are some practical queries and its performance.

Get

Get all listing's availabilities for specific listing

> SELECT * FROM `listing_availabilities` WHERE `listing_id` = 716384;
> 365 rows in set (0.01 sec)

See whether specific time is available for this listing. 1 if available/0 if not

> SELECT count(*) as exists FROM `listing_availabilities` 
> WHERE `listing_id` = 1234 
> AND 1481527584 BETWEEN `start_time` AND `end_time`
> LIMIT 1;
> 1 row in set (0.00 sec)

Get first 50 listings joining with first available time slot

> SELECT * 
> FROM listings AS l 
> LEFT JOIN listing_availabilities a 
>    ON a.listing_id = l.id 
>   AND a.start_time = (SELECT start_time FROM listing_availabilities WHERE l.id = listing_id ORDER BY start_time ASC LIMIT 1) 
> LIMIT 50;
> 50 rows in set (0.05 sec)

Get first 50 listings where my specific timestamp is available (assuming conditions meet)

> SELECT * 
> FROM listings AS l 
> LEFT JOIN listing_availabilities a 
>    ON  a.listing_id = l.id 
>    AND a.start_time = 
>        (SELECT start_time FROM listing_availabilities WHERE l.id = listing_id AND start_time < 1481536932 AND end_time > 1481536932 LIMIT 1)
> WHERE a.start_time IS NOT NULL 
> LIMIT 50;
> 50 rows in set (0.05 sec)

Get first 50 listings where my specific timestamp is not available (when no availability time is met)

> SELECT * 
> FROM `listings` AS l 
> LEFT JOIN `listing_availabilities` a 
>    ON  a.listing_id = l.id 
>    AND a.start_time = 
>        (SELECT `start_time` FROM `listing_availabilities` WHERE l.id = listing_id AND `start_time` < 1481530494 AND `end_time` > 1481530494 LIMIT 1)
> WHERE a.start_time IS NOT NULL 
> LIMIT 50;
> Empty set (3.00 sec)

This one took much longer and, with the help of EXPLAIN EXTENDED, it seems to scan the whole table when using multiple column inequality in subqueries. I'm still not 100% sure on why this is the case...someone could enlighten me here.

The alternative to this is to left join all availabilities to each listings and then filter from that set:

> SELECT l.id, a.start_time, a.end_time 
> FROM listings as l
> LEFT JOIN listing_availabilities a 
>    ON l.id = a.listing_id 
> WHERE 1481530494 
>    BETWEEN a.start_time 
>    AND a.end_time limit 50;
> Empty set (0.01 sec)

INSERT

Automatically insert listing_availabilities for 365th day

> INSERT INTO `listing_availabilities` 
>    (listing_id, start_time, end_time) 
> VALUES 
>    (8, 1481689555, 1481689556) ...
> Query OK, 500 rows affected (0.01 sec)
> Records: 500  Duplicates: 0  Warnings: 0

DELETE

Remove old availabilities

> DELETE FROM `listing_availabilities` 
> WHERE `end_time` < 1481671237 
> LIMIT 100000;
> Query OK, 100000 rows affected (10.43 sec)

This seems to take a while and locks the DB so I've decided to do it in smaller chunks as such:

> DELETE FROM `listing_availabilities` 
> WHERE end_time < 1481671237 
> LIMIT 500;
> Query OK, 500 rows affected (0.09 sec)

and do this until affected rows become zero.

Obviously, you need a script to remove old availabilities and add new availabilities for months ahead.

This seems to work for my application and it should satisfy the requirements the OP wants. I hope this helps other database schemas.

like image 22
Taku Avatar answered Nov 10 '22 06:11

Taku