Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shared hit cache in postgreSQL

I'm experimenting with the EXPLAIN command and trying to find out what the shared hit is.

Seq Scan on foo  (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.030..90.500 rows=1000000 loops=1)
  Buffers: shared hit=512 read=7822
Total runtime: 116.080 ms

I've noticed that the more shared hit number we have the faster a query is being executed. But what is that? As far as I got, shared read is just reading from a physical storage like RAID or SSD. But why is the shared hit faster? Is it stored inside RAM or where?

like image 688
St.Antario Avatar asked Oct 02 '15 12:10

St.Antario


People also ask

What is cache hit ratio Postgres?

Validated on 28 July 2022 • Posted on 28 July 2022. Cache hit ratio is a metric that measures the efficiency of key retrieval in a database. Cache hit ratio is the ratio of key hits against the total number of key hits and misses.

What is Work_mem in Postgres?

work_mem ( integer ) Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. If this value is specified without units, it is taken as kilobytes. The default value is four megabytes ( 4MB ).

Does Postgres have a cache?

Postgres manages a “Shared Buffer Cache”, which it allocates and uses internally to keep data and indexes in memory. This is usually configured to be about 25% of total system memory for a server running a dedicated Postgres instance, such as all Heroku Postgres instances.

What is shared buffer in PostgreSQL?

shared_buffers. PostgreSQL uses 'double buffering', meaning that PostgreSQL uses its own internal buffer as well as kernel buffered IO. In short, this means that data is stored in memory twice. The PostgreSQL buffer is named shared_buffers and it defines how much dedicated system memory PostgreSQL will use for cache.


1 Answers

shared hit essentially means the value has already been cached in the main memory of the computer and it was not necessary to read this from the hard disk.

Accessing the main memory (RAM) is much faster than reading values from the hard disk. And that's why the query is faster the more share hits it has.

Immediately after starting Postgres, none of the data is available in the main memory (RAM) and everything needs to be read from the hard disk.

Consider this step from an execution plan:

  ->  Seq Scan on products.product_price  (cost=0.00..3210.27 rows=392273 width=0) (actual time=0.053..103.958 rows=392273 loops=1)
        Output: product_id, valid_from, valid_to, price
        Buffers: shared read=2818
        I/O Timings: read=48.382

The part "Buffers: shared read=2818" means that 2818 blocks (each 8k) had to be read from the hard disk (and that took 48ms - I have a SSD). Those 2818 blocks were stored in the cache (the "shared buffers") so that the next time they are needed the database does not need to read them (again) from the (slow) hard disk.

When I re-run that statement the plan changes to:

  ->  Seq Scan on products.product_price  (cost=0.00..3210.27 rows=392273 width=0) (actual time=0.012..45.690 rows=392273 loops=1)
        Output: product_id, valid_from, valid_to, price
        Buffers: shared hit=2818

Which means that those 2818 blocks that the previous statement were still in the main memory (=RAM) and Postgres did not need to read them from the hard disk.

"memory" always refers to the main memory (RAM) built into the computer and directly accessible to the CPU - as opposed to "external storage".

There are several presentations on how Postgres manages the shared buffers:

  • http://de.slideshare.net/EnterpriseDB/insidepostgressharedmemory2015
  • http://momjian.us/main/writings/pgsql/hw_performance/
  • https://2ndquadrant.com/media/pdfs/talks/InsideBufferCache.pdf (very technical)
  • http://raghavt.blogspot.de/2012/04/caching-in-postgresql.html
like image 139
a_horse_with_no_name Avatar answered Sep 19 '22 11:09

a_horse_with_no_name