Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why PostgreSQL(timescaledb) costs more storage in table?

I'm new to database. Recently I start using timescaledb, which is an extension in PostgreSQL, so I guess this is also PostgreSQL related.

I observed a strange behavior. I calculated my table structure, 1 timestamp, 2 double, so totally 24bytes per row. And I imported (by psycopg2 copy_from) 2,750,182 rows from csv file. I manually calculated the size should be 63MB, but I query timescaledb, it tells me the table size is 137MB, index size is 100MB and total 237MB. I was expecting that the table size should equal my calculation, but it doesn't. Any idea?

like image 325
Xiang Zhang Avatar asked Nov 23 '17 21:11

Xiang Zhang


People also ask

How much data can TimescaleDB handle?

Users can store 100s of billions of rows and 10s of terabytes of data on a single machine, or scale to petabytes across many servers. TimescaleDB includes a number of time-oriented features that aren't found in traditional relational databases including functions for time-oriented analytics.

Is TimescaleDB faster?

For complex queries that go beyond rollups or thresholds, there really is no comparison: TimescaleDB [Fully Managed Service for TimescaleDB, as of September 2021] vastly outperforms InfluxDB here (in some cases over thousands of times faster).

What is TimescaleDB in PostgreSQL?

TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL and packaged as a PostgreSQL extension, providing automatic partitioning across time and space (partitioning key), as well as full SQL support.

How many inserts per second can Postgres handle?

At 200 million rows the insert rate in PostgreSQL is an average of 30K rows per second and only gets worse; at 1 billion rows, it's averaging 5K rows per second. On the other hand, TimescaleDB sustains an average insert rate of 111K rows per second through 1 billion rows of data–a 20x improvement.


1 Answers

There are two basic reasons your table is bigger than you expect: 1. Per tuple overhead in Postgres 2. Index size

  1. Per tuple overhead: An answer to a related question goes into detail that I won't repeat here but basically Postgres uses 23 (+padding) bytes per row for various internal things, mostly multi-version concurrency control (MVCC) management (Bruce Momjian has some good intros if you want more info). Which gets you pretty darn close to the 137 MB you are seeing. The rest might be because of either the fill factor setting of the table or if there are any dead rows still included in the table from say a previous insert and subsequent delete.
  2. Index Size: Unlike some other DBMSs Postgres does not organize its tables on disk around an index, unless you manually cluster the table on an index, and even then it will not maintain the clustering over time (see https://www.postgresql.org/docs/10/static/sql-cluster.html). Rather it keeps its indices separately, which is why there is extra space for your index. If on-disk size is really important to you and you aren't using your index for, say, uniqueness constraint enforcement, you might consider a BRIN index, especially if your data is going in with some ordering (see https://www.postgresql.org/docs/10/static/brin-intro.html).
like image 185
davidk Avatar answered Sep 19 '22 13:09

davidk