Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL Table in memory

I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).

The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD. I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem. I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters). But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.

Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory. Any advice?

Regards, Kapil

like image 251
Kapil Avatar asked Dec 23 '13 06:12

Kapil


1 Answers

You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.

That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.

That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.

Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.

The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.

It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.

If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?

Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.

You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.

like image 50
Craig Ringer Avatar answered Nov 10 '22 02:11

Craig Ringer