I'm having troubles when I perform the first query on a table. Subsequent queries are much faster, even if I change the range date to look for. I assume that PostgreSQL implements a caching mechanism that allows the subsequent queries to be much faster. I can try to warmup the cache so the first user request can hit the cache. However, I think I can somehow improve the following query: <pre class="prettyprint"><code>SELECT y.id, y.title, x.visits, x.score FROM ( SELECT article_id, visits, COALESCE(ROUND((visits / NULLIF(hits ,0)::float)::numeric, 4), 0) score FROM ( SELECT article_id, SUM(visits) visits, SUM(hits) hits FROM article_reports WHERE a.site_id = 'XYZ' AND a.date >= '2017-04-13' AND a.date <= '2017-06-28' GROUP BY article_id ) q ORDER BY score DESC, visits DESC LIMIT(20) ) x INNER JOIN articles y ON x.article_id = y.id </code></pre> Any ideas on how can I improve this. The following is the result of EXPLAIN: <pre class="prettyprint"><code> Nested Loop (cost=84859.76..85028.54 rows=20 width=272) (actual time=12612.596..12612.836 rows=20 loops=1) -> Limit (cost=84859.34..84859.39 rows=20 width=52) (actual time=12612.502..12612.517 rows=20 loops=1) -> Sort (cost=84859.34..84880.26 rows=8371 width=52) (actual time=12612.499..12612.503 rows=20 loops=1) Sort Key: q.score DESC, q.visits DESC Sort Method: top-N heapsort Memory: 27kB -> Subquery Scan on q (cost=84218.04..84636.59 rows=8371 width=52) (actual time=12513.168..12602.649 rows=28965 loops=1) -> HashAggregate (cost=84218.04..84301.75 rows=8371 width=36) (actual time=12513.093..12536.823 rows=28965 loops=1) Group Key: a.article_id -> Bitmap Heap Scan on article_reports a (cost=20122.78..77122.91 rows=405436 width=36) (actual time=135.588..11974.774 rows=398242 loops=1) Recheck Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date)) Heap Blocks: exact=36911 -> Bitmap Index Scan on index_article_reports_on_site_id_and_article_id_and_date (cost=0.00..20021.42 rows=405436 width=0) (actual time=125.846..125.846 rows=398479 loops=1)" Index Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date)) -> Index Scan using articles_pkey on articles y (cost=0.42..8.44 rows=1 width=128) (actual time=0.014..0.014 rows=1 loops=20) Index Cond: (id = q.article_id) Planning time: 1.443 ms Execution time: 12613.689 ms </code></pre> Thanks in advance

There are two levels of "cache" that Postgres uses: <ul> <li>OS file cache</li> <li>shared buffers.</li> </ul> Important: Postgres directly controls only the second one, and relies on the first one, which is under OS' control. First thing I would check are these two settings in postgresql.conf: <ul> <li> <code>effective_cache_size</code> – usually I set it to ~3/4 of all RAM available. Notice that it's not a setting that tells Postgres how to allocate memory, it's just "an advice" to Postgres planner telling some estimate of OS file cache size</li> <li> <code>shared_buffers</code> – usually I set it to 1/4 of RAM size. This is allocation setting.</li> </ul> Also, I'd check other memory-related settings (<code>work_mem</code>, <code>maintenance_work_mem</code>) to understand how much RAM might be consumed, so will my <code>effective_cache_size</code> estimation be correct at most times. But if you just turned your Postgres on, the first queries will most probably be long because there is no data in OS file cache and in shared buffers. You can check it with advanced <code>EXPLAIN</code> options: <pre class="prettyprint"><code>EXPLAIN (ANALYZE, BUFFERS) SELECT ... </code></pre> -- you will see how many buffers were fetched from disk ("read") or from cache ("hit") Here you can find good material on using <code>EXPLAIN</code>: http://www.dalibo.org/_media/understanding_explain.pdf Additionally, there is an extension aiming to solve "cold cache" problem: pg_prewarm https://www.postgresql.org/docs/current/static/pgprewarm.html Also, working with SSD disks instead of magnetic ones will mean that disk reads will be much faster. Have fun and well working Postgres :-)

If it is the first query after inserting several rows you must run an <pre class="prettyprint"><code>ANALYZE </code></pre> in all the database or over the involved tables. Try executing it at database level.

Slow on first query

Tags:

postgresql

I'm having troubles when I perform the first query on a table. Subsequent queries are much faster, even if I change the range date to look for. I assume that PostgreSQL implements a caching mechanism that allows the subsequent queries to be much faster. I can try to warmup the cache so the first user request can hit the cache. However, I think I can somehow improve the following query:

Click to copy

SELECT
    y.id, y.title, x.visits, x.score
FROM (
    SELECT
        article_id, visits,
        COALESCE(ROUND((visits / NULLIF(hits ,0)::float)::numeric, 4), 0) score
    FROM (
        SELECT
            article_id, SUM(visits) visits, SUM(hits) hits
        FROM
            article_reports
        WHERE
            a.site_id = 'XYZ' AND a.date >= '2017-04-13'  AND a.date <= '2017-06-28'
        GROUP BY
            article_id
    ) q ORDER BY score DESC, visits DESC LIMIT(20)
) x 
INNER JOIN 
    articles y ON x.article_id = y.id

Any ideas on how can I improve this. The following is the result of EXPLAIN:

Click to copy

   Nested Loop  (cost=84859.76..85028.54 rows=20 width=272) (actual time=12612.596..12612.836 rows=20 loops=1)
  ->  Limit  (cost=84859.34..84859.39 rows=20 width=52) (actual time=12612.502..12612.517 rows=20 loops=1)
    ->  Sort  (cost=84859.34..84880.26 rows=8371 width=52) (actual time=12612.499..12612.503 rows=20 loops=1)
          Sort Key: q.score DESC, q.visits DESC
          Sort Method: top-N heapsort  Memory: 27kB
          ->  Subquery Scan on q  (cost=84218.04..84636.59 rows=8371 width=52) (actual time=12513.168..12602.649 rows=28965 loops=1)
                ->  HashAggregate  (cost=84218.04..84301.75 rows=8371 width=36) (actual time=12513.093..12536.823 rows=28965 loops=1)
                      Group Key: a.article_id
                      ->  Bitmap Heap Scan on article_reports a  (cost=20122.78..77122.91 rows=405436 width=36) (actual time=135.588..11974.774 rows=398242 loops=1)
                            Recheck Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
                            Heap Blocks: exact=36911
                            ->  Bitmap Index Scan on index_article_reports_on_site_id_and_article_id_and_date  (cost=0.00..20021.42 rows=405436 width=0) (actual time=125.846..125.846 rows=398479 loops=1)"
                                  Index Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
  ->  Index Scan using articles_pkey on articles y  (cost=0.42..8.44 rows=1 width=128) (actual time=0.014..0.014 rows=1 loops=20)
       Index Cond: (id = q.article_id)
 Planning time: 1.443 ms
 Execution time: 12613.689 ms

Thanks in advance

905

asked Jun 28 '17 17:06

absg

2 Answers

There are two levels of "cache" that Postgres uses:

OS file cache
shared buffers.

Important: Postgres directly controls only the second one, and relies on the first one, which is under OS' control.

First thing I would check are these two settings in postgresql.conf:

effective_cache_size – usually I set it to ~3/4 of all RAM available. Notice that it's not a setting that tells Postgres how to allocate memory, it's just "an advice" to Postgres planner telling some estimate of OS file cache size
shared_buffers – usually I set it to 1/4 of RAM size. This is allocation setting.

Also, I'd check other memory-related settings (work_mem, maintenance_work_mem) to understand how much RAM might be consumed, so will my effective_cache_size estimation be correct at most times.

But if you just turned your Postgres on, the first queries will most probably be long because there is no data in OS file cache and in shared buffers. You can check it with advanced EXPLAIN options:

Click to copy

EXPLAIN (ANALYZE, BUFFERS) SELECT ...

-- you will see how many buffers were fetched from disk ("read") or from cache ("hit")

Here you can find good material on using EXPLAIN: http://www.dalibo.org/_media/understanding_explain.pdf

Additionally, there is an extension aiming to solve "cold cache" problem: pg_prewarm https://www.postgresql.org/docs/current/static/pgprewarm.html

Also, working with SSD disks instead of magnetic ones will mean that disk reads will be much faster.

Have fun and well working Postgres :-)

140

answered Oct 01 '22 00:10

Nick

If it is the first query after inserting several rows you must run an

Click to copy

ANALYZE

in all the database or over the involved tables. Try executing it at database level.

answered Oct 01 '22 00:10

Emilio Platzer

Related questions
                            
                                Determine postgres numeric max min values
                            
                                Postgresql : Is there a way to select all valid json data type
                            
                                Merging two data sets on closest date efficiently in PostgreSQL
                            
                                PostgreSQL select all from one table and join count from table relation
                            
                                psycopg2 cannot find any tables after connection
                            
                                How to add a running count to rows in a 'streak' of consecutive days
                            
                                Postgres pg_trgm - why ordering by similarity is very slow
                            
                                Get a row count of every table in Postgres database
                            
                                Server mismatch error in postgresql
                            
                                How to convert two rows into key-value json object in postgresql?
                            
                                java SimpleDateFormat - Correct format for Postgres "timestamp with timezone" date format
                            
                                How to configure Spring Data to use Postgres with Hibernate without XML?
                            
                                NOT NULL constraint asking for a default value
                            
                                Separate where in query
                            
                                Square bracket in table/column name is not supported?
                            
                                how to restart postgresql service in windows 10 when updating pg_hba.config
                            
                                Get current week in postgreSQL
                            
                                How to get current month firstdate and lastdate in postgres sql query
                            
                                Datasource configuration in wildfly 10
                            
                                How to keep comments inside a view definition with PostgreSQL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slow on first query

Tags:

postgresql

absg

People also ask

2 Answers

Nick

Emilio Platzer

Recent Activity

Donate For Us