Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does timescaledb index works the same as postgreSQL?

I am testing an PostgreSQL extension named Timescaledb for time series data. If I read the document of PostgreSQL right, the query for example

WHERE x = 'somestring' and timestamp between 't1' and 't2'

will work best with index (x,timestamp). And run EXPLAIN on that SQL query shows that it works.

When I try the same query on Timescaledb hypertable, which contains same data and without index (x,timestamp). The performance is about the same (if not better). After creating index (x,timestamp), the performance does not improve.

I understand that the hypertable have a build-in timestamp index. So, I should have a different strategy to add index to the table, for example index with just (x). Is that right?

like image 764
Maxi Wu Avatar asked May 31 '18 10:05

Maxi Wu


People also ask

Why timescaledb is better than PostgreSQL?

TimescaleDB achieves a much higher and more stable ingest rate than PostgreSQL for time-series data. As described in our architectural discussion , PostgreSQL's performance begins to significantly suffer as soon as indexed tables can no longer fit in memory.

Can I create an index on a column in timescaledb?

You can create an index on any combination of columns, as long as you include the time column, for time-series data. TimescaleDB supports all table objects supported within PostgreSQL, including data types, indexes, and triggers.

Why timescaledb for time series data?

TimescaleDB offers three key benefits over vanilla PostgreSQL or other traditional RDBMSs for storing time-series data: Much higher data ingest rates, especially at larger database sizes. Query performance ranging from equivalent to orders of magnitude greater. Time-oriented features.

What is timescale?

Timescale is the creator of TimescaleDB , the first open-source relational database... TimescaleDB is purpose-built to scale and handle time-series data workloads and is... DevOps and Infrastructure Monitoring All monitoring data is time-series data.


1 Answers

A few things about how TimescaleDB handles queries:

  1. The primary way that time-based queries get improved performance is through chunk exclusion. Data is partitioned by time into chunks so that when a query for a particular time range is executed, the planner can ignore chunks that have data outside of that time range. Indexes are then applied for chunks that are being searched.

    If you are searching a time-range that includes all chunks, chunk exclusion does not apply, and so you get query times closer to standard PostgreSQL.

  2. If your query matches on a large number of the rows in the chunks being scanned, the query planner may choose a sequential scan instead of an index scan to save on I/O operations https://github.com/timescale/timescaledb/issues/317.

  3. There is nothing inherently special about the built-in indexes, you can drop the indexes after hypertable creation or turn them off when running create_hypertable (see timescale api docs).

like image 165
suntruth Avatar answered Nov 05 '22 05:11

suntruth