Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How big is too big for a PostgreSQL table?

People also ask

Can Postgres handle 100 million rows?

If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes). If you're aggregating then you're at about 1-2 million rows per second.

Is Postgres suitable for big data?

PostgreSQL is well known as the most advanced opensource database, and it helps you to manage your data no matter how big, small or different the dataset is, so you can use it to manage or analyze your big data, and of course, there are several ways to make this possible, e.g Apache Spark.

Can Postgres handle millions of records?

Rows per a table won't be an issue on it's own. So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing. I agree that 90 million rows won't be a problem for PostgreSQL.

Can Postgres handle billions of rows?

As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.


Rows per a table won't be an issue on it's own.

So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing.

Depending on your data distribution you can use a mixture of indexes, filtered indexes, and table partitioning of some kind to speed thing up once you see what performance issues you may or may not have. Your problem will be the same on any other RDMS that I know of. If you only need 3 months worth of data design in a process to prune off the data you don't need any more. That way you will have a consistent volume of data on the table. Your lucky you know how much data will exist, test it for your volume and see what you get. Testing one table with 90 million rows may be as easy as:

select x,1 as c2,2 as c3
from generate_series(1,90000000) x;

https://wiki.postgresql.org/wiki/FAQ

Limit   Value
Maximum Database Size       Unlimited
Maximum Table Size          32 TB
Maximum Row Size            1.6 TB
Maximum Field Size          1 GB
Maximum Rows per Table      Unlimited
Maximum Columns per Table   250 - 1600 depending on column types
Maximum Indexes per Table   Unlimited

Another way to speed up your queries significantly on a table with > 100 million rows is in the off hours cluster the table on the index that is most often used in your queries. We have a table with > 218 million rows and have found 30X improvements.

Also, for a very large table, it's a good idea to create an index on your foreign keys.

EDIT: From the comments:

EXAMPLE:

  1. The table I am referring to is called investment in this example.
  2. The index most used in queries is (bankid,record_date)

So here is your step by step:

  1. psql -c "drop index investment_bankid_rec_dt_idx;" dbname
  2. psql -c "create index investment_bankid_rec_dt_idx on investment(bankid, record_date);"
  3. psql -c "cluster investment_bankid_rec_dt_idx on investment;"
  4. vacuumdb -d ccbank -z -v -t investment

So in step one and two we drop the index and recreate it.