I am testing an PostgreSQL extension named Timescaledb for time series data. If I read the document of PostgreSQL right, the query for example
WHERE x = 'somestring' and timestamp between 't1' and 't2'
will work best with index (x,timestamp)
. And run EXPLAIN
on that SQL query shows that it works.
When I try the same query on Timescaledb hypertable, which contains same data and without index (x,timestamp)
. The performance is about the same (if not better). After creating index (x,timestamp)
, the performance does not improve.
I understand that the hypertable have a build-in timestamp index. So, I should have a different strategy to add index to the table, for example index with just (x)
. Is that right?
TimescaleDB achieves a much higher and more stable ingest rate than PostgreSQL for time-series data. As described in our architectural discussion , PostgreSQL's performance begins to significantly suffer as soon as indexed tables can no longer fit in memory.
You can create an index on any combination of columns, as long as you include the time column, for time-series data. TimescaleDB supports all table objects supported within PostgreSQL, including data types, indexes, and triggers.
TimescaleDB offers three key benefits over vanilla PostgreSQL or other traditional RDBMSs for storing time-series data: Much higher data ingest rates, especially at larger database sizes. Query performance ranging from equivalent to orders of magnitude greater. Time-oriented features.
Timescale is the creator of TimescaleDB , the first open-source relational database... TimescaleDB is purpose-built to scale and handle time-series data workloads and is... DevOps and Infrastructure Monitoring All monitoring data is time-series data.
A few things about how TimescaleDB handles queries:
The primary way that time-based queries get improved performance is through chunk exclusion. Data is partitioned by time into chunks so that when a query for a particular time range is executed, the planner can ignore chunks that have data outside of that time range. Indexes are then applied for chunks that are being searched.
If you are searching a time-range that includes all chunks, chunk exclusion does not apply, and so you get query times closer to standard PostgreSQL.
If your query matches on a large number of the rows in the chunks being scanned, the query planner may choose a sequential scan instead of an index scan to save on I/O operations https://github.com/timescale/timescaledb/issues/317.
There is nothing inherently special about the built-in indexes, you can drop the indexes after hypertable creation or turn them off when running create_hypertable
(see timescale api docs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With