PostgreSQL: How to structure and index time-related data for optimal query performance?

Tags:

The Problem:

I have time-related data in my database and I am struggling to organize, structure and index that data in a way so that users can retrieve it efficiently; even simple database queries take longer than acceptable.

Project Context:

While this is a pure database question, some context might help to understand the data model:

The project centers around doing research on a big, complex machine. I don't know a lot about the machine itself, but rumour in the lab has it there's a flux capacitor in there somewhere - and I think yesterday, I spotted the tail of Schrödinger's cat hanging out of it at the side ;-)

We measure many different parameters while the machine is running using sensors positioned all over the machine at different measurement points (so-called spots) at certain intervals over a period of time. We use not only one device to measure these parameters, but a whole range of them; they differ in the quality of their measurement data (I think this involves sample rates, sensor quality, price and many other aspects that I'm not concerned with); one aim of the project actually is to establish a comparison between these devices. You can visualize these measurement devices as a bunch of lab trolleys, each with a lot of cables connected to the machine, each delivering measurement data.

The Data Model:

There is measurement data from every spot and every device for every parameter, for example once a minute over a period of 6 days. My job is to store that data in a database and to provide efficient access to it.

In a nutshell:

a device has a unique name
a parameter also has a name; they're not unique though, so it also has an ID
a spot has an ID

The project database is more complex of course, but these details don't seem relevant to the issue.

a measurement data index has an ID, a time stamp for when the measurement was done and references to the device and the spot on which the measurement was carried out
a measurement data value has a reference to the parameter and to the value that was actually measured

Initially, I had modeled the measurement data value to have its own ID as primary key; the n:m relationship between measurement data index and value was a separate table that only stored index:value ID pairs, but as that table itself consumed quite a lot of harddrive space, we eliminated it and changed the value ID to be a simple integer that stores the ID of the measurement data index it belongs to; the primary key of the measurement data value is now composed of that ID and the parameter ID.

On a side note: When I created the data model, I carefully followed common design guidelines like 3NF and appropriate table constraints (such as unique keys); another rule of thumb was to create an index for every foreign key. I have a suspicion that the deviation in the measurement data index / value tables from 'strict' 3NF might be one of the reasons for the performance issues I am looking at now, but changing the data model back has not solved the problem.

The Data Model in DDL:

NOTE: There is an update to this code further below.

The script below creates the database and all tables involved. Please note that there are no explicit indexes yet. Before you run this, please make sure you don't happen to already have a database called so_test with any valuable data...

PostgreSQL: How to structure and index time-related data for optimal query performance?

Tags:

performance

sql

indexing

postgresql

database-design

ssc

People also ask

1 Answers

DB schema

Test data

Query

Edit after feedback

Further options

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us