Calendar tables in PostgreSQL 9

Tags:

I am building an analytics database (I have a firm understanding of the data and the business objectives and only basic-to-moderate database skills).

I have come across some references to building similar warehouses which implement the concept of 'calendar tables'. This makes sense and is easily enough done. Most examples I see, however, are calendar tables that limit scope to 'day'. My data will need to be analyzed down to hour-level. Possibly minutes.

My question: would an implementation of calendar tables for hour/minute-level granularity be of value in terms of space-efficiency and query/sorting speed? If so, can you recommend a table structure and population method/example?

My primary data table will contain 20+ million rows of data at any given time and typical subsets for analysis are in the 1 to 5 million range. So, as you can see, that is a lot of timestamp fields.

802

asked Apr 28 '11 16:04

NJ.

2 Answers

In PostgreSQL, you can generate calendar tables of arbitrary length and granularity on the fly:

SELECT  CAST('2011-01-01' AS DATE) + (n || ' hour')::INTERVAL
FROM    generate_series(0, 23) n

This does not require recursion (as with the other systems) and is a preferred method to generate the volatile resultsets.

answered Sep 21 '22 19:09

Quassnoi

Calendar tables implement a space/time tradeoff. By using more space, some kinds of queries run in less time, because they can take advantage of indexes. They're safe as long as you're careful with the CHECK() constraints, and as long as you have administrative processes to take care of any constraints that your dbms doesn't support.

If your granularity is one minute, you'll need to generate about a half million rows for each year. A minimal calendar table would look like this.

2011-01-01 00:00:00
2011-01-01 00:01:00
2011-01-01 00:02:00
2011-01-01 00:03:00
2011-01-01 00:04:00

If you're doing "bucket" analysis, you might be better off with something like this.

bucket_start         bucket_end
--
2011-01-01 00:00:00  2011-01-01 00:01:00
2011-01-01 00:01:00  2011-01-01 00:02:00
2011-01-01 00:02:00  2011-01-01 00:03:00
2011-01-01 00:03:00  2011-01-01 00:04:00
2011-01-01 00:04:00  2011-01-01 00:05:00

Since SQL's BETWEEN operator includes the endpoints, you usually need to avoid using it. That's because it includes the endpoints, and it's hard to express bucket_end as "bucket_start plus one minute, minus the smallest bit of time this server can recognize". (The danger is a value that's a microsecond greater than bucket_end, but still less than the next value for bucket_start.)

If I were going to build that table, I'd probably do it like this. (Although I'd think harder about whether I should call it "calendar".)

create table calendar (
  bucket_start timestamp primary key,
  bucket_end timestamp unique,
  CHECK (bucket_end = bucket_start + interval '1' minute)
  -- You also want a "no gaps" constraint, but I don't think you 
  -- can do that in a CHECK constraint in PostgreSQL. You might
  -- be able to use a trigger that counts the rows, and compares
  -- that count to the number of minutes between min(bucket_start)
  -- and max(bucket_start). Worst case, you can always run a report
  -- that counts the rows and sends you an email.
);

The UNIQUE constraint creates an implicit index in PostgreSQL.

This query will insert one day's worth of rows (24 hours * 60 minutes) at a time.

insert into calendar
select coalesce(
                (select max(bucket_start) from calendar), 
                 cast('2011-01-01 00:00:00' as timestamp)
               ) 
             + cast((n || 'minute') as interval) as bucket_start, 
       coalesce(
                (select max(bucket_start) from calendar), 
                 cast('2011-01-01 00:00:00' as timestamp)
               ) 
             + cast((n + 1 || ' minute') as interval) as bucket_end
from generate_series(1, (24*60) ) n;

You can wrap that in a function to generate a year at a time. I'd probably try to commit fewer than a half million rows at a time.

It shouldn't take too long to generate 20 million rows for testing, and another 20 million rows of "calendar" minutes. Long lunch. Maybe an afternoon in the sun.

answered Sep 21 '22 19:09

Mike Sherrill 'Cat Recall'

Related questions
                            
                                Postgresql one db with multiple schemas vs multiple db with one schema
                            
                                MongoDB Database Structure and Best Practices Help
                            
                                Task list with re-ordering feature using Firebase/Firestore
                            
                                Database per application VS One big database for all applications [closed]
                            
                                enums in SQL Server database
                            
                                What data type should I use for IETF language codes?
                            
                                Double Foreign key in postgresql
                            
                                Single Table Inheritance (Database Inheritance design options) pros and cons and in which case it used?
                            
                                natural key vs surrogate key an innodb foreign key
                            
                                What is the best default transaction isolation level for an ERP, if any?
                            
                                Building a Contact Database - Need a little schema inspiration
                            
                                SQL - Best practice for a Friendship table
                            
                                SQL Injection after removing all single-quotes and dash-characters
                            
                                Prevent insert if condition is met
                            
                                Why specify a length for character varying types
                            
                                Are there any reasons why I should/shouldn't use ObjectId's in my RESTful url's
                            
                                Database design - Should a Date be used as part of a primary key
                            
                                How to share a table between multiple Postgresql databases
                            
                                Designing Database Schema for Event-based Analytics
                            
                                Defining multiple foreign keys in one table to many tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calendar tables in PostgreSQL 9

Tags:

postgresql

rails-postgresql

database-design

data-warehouse

NJ.

People also ask

2 Answers

Quassnoi

Mike Sherrill 'Cat Recall'

Recent Activity

Donate For Us