I'm a relative Postgres newbie, though have some basic experience with MSSQL.
I have a table on PostgreSQL (PostGIS, it's spatial) that contains about 10,000,000 polygons. The machine it's sitting on has 64gb RAM, 16 cores and a 1TB spinning HDD. This is only of the only tables in the database at the moment. Because access to the table is infrequent (perhaps once every few hours) I'm noticing that the table will not stay located in RAM, as I would expect with MSSQL. Instead, the table seems to be released from memory and sit on disc in an active state. This is leading to 100% HDD utilization for 15+ minutes when I want to query / join / interrogate / etc. When the table seemingly appears to be in memory subsequent operations are notably faster (seconds rather than minutes).
Is there a way to ask Postgres to keep a certain table in memory, or have the scheduler / whatever bits of postgres that perform the smarts keep the table in ram, rather than letting it go to disk then having to recall it into memory when required?
I have spatial indexes (and a couple of other columns that often require filtering / sorting indexed), so when being called from memory it's quite fast.
This same issue also seems to massively affect JOINS, because they too require the table to be read first. This is a separate issue for me, but seemingly affected by the same root problem. Disk IO bound.
My DB settings are as such - so generally I'm not inhibited by available memory / ram, so far as I can tell.
Edit: Table is 26gb
Postgres 13.2 with PostGIS 3.1.1
max_connections = '20';
shared_buffers = '8GB';
effective_cache_size = '24GB';
maintenance_work_mem = '2047MB';
checkpoint_completion_target = '0.9';
wal_buffers = '16MB';
default_statistics_target = '500';
random_page_cost = '4';
work_mem = '26214kB';
min_wal_size = '4GB';
max_wal_size = '16GB';
max_worker_processes = '16';
max_parallel_workers_per_gather = '8';
max_parallel_workers = '16';
max_parallel_maintenance_workers = '4';
You can't run Pg in-process, in-memory PostgreSQL is implemented in C and compiled to platform code. Unlike H2 or Derby you can't just load the jar and fire it up as a throwaway in-memory DB. Unlike SQLite, which is also written in C and compiled to platform code, PostgreSQL can't be loaded in-process either.
PostgreSQL normally stores its table data in chunks of 8KB. The number of these blocks is limited to a 32-bit signed integer (just over two billion), giving a maximum table size of 16TB.
Since VACUUM does not take any exclusive lock on tables, it does not (or minimal) impact other database work. The configuration of Auto-VACUUM should be done based on the usage pattern of the database.
The easiest but the most efficient way to export data from a Postgres table to a CSV file is by using the COPY command. COPY command generates a CSV file on the Database Server. You can export the entire table or the results of a query to a CSV file with the COPY TO command.
You didn't say how large the table is.
Data is never evicted from shared buffers just out of boredom. Only to make room for other things, or because they became invalid (table dropped or truncated, etc.). So if you make shared buffers large enough and read the whole table in with pg_prewarm, it will stay there until displaced. (There is a mechanism for sequential scans of large tables that will preferentially evict data it just read to make room for more data from the same table, but pg_prewarm is not subject to that). If the entire database fits in shared buffers, it will all be retained indefinitely.
Stock postgresql offers no other way to pin a table or list of tables into memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With