Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow on first query

Tags:

postgresql

I'm having troubles when I perform the first query on a table. Subsequent queries are much faster, even if I change the range date to look for. I assume that PostgreSQL implements a caching mechanism that allows the subsequent queries to be much faster. I can try to warmup the cache so the first user request can hit the cache. However, I think I can somehow improve the following query:

SELECT
    y.id, y.title, x.visits, x.score
FROM (
    SELECT
        article_id, visits,
        COALESCE(ROUND((visits / NULLIF(hits ,0)::float)::numeric, 4), 0) score
    FROM (
        SELECT
            article_id, SUM(visits) visits, SUM(hits) hits
        FROM
            article_reports
        WHERE
            a.site_id = 'XYZ' AND a.date >= '2017-04-13'  AND a.date <= '2017-06-28'
        GROUP BY
            article_id
    ) q ORDER BY score DESC, visits DESC LIMIT(20)
) x 
INNER JOIN 
    articles y ON x.article_id = y.id

Any ideas on how can I improve this. The following is the result of EXPLAIN:

   Nested Loop  (cost=84859.76..85028.54 rows=20 width=272) (actual time=12612.596..12612.836 rows=20 loops=1)
  ->  Limit  (cost=84859.34..84859.39 rows=20 width=52) (actual time=12612.502..12612.517 rows=20 loops=1)
    ->  Sort  (cost=84859.34..84880.26 rows=8371 width=52) (actual time=12612.499..12612.503 rows=20 loops=1)
          Sort Key: q.score DESC, q.visits DESC
          Sort Method: top-N heapsort  Memory: 27kB
          ->  Subquery Scan on q  (cost=84218.04..84636.59 rows=8371 width=52) (actual time=12513.168..12602.649 rows=28965 loops=1)
                ->  HashAggregate  (cost=84218.04..84301.75 rows=8371 width=36) (actual time=12513.093..12536.823 rows=28965 loops=1)
                      Group Key: a.article_id
                      ->  Bitmap Heap Scan on article_reports a  (cost=20122.78..77122.91 rows=405436 width=36) (actual time=135.588..11974.774 rows=398242 loops=1)
                            Recheck Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
                            Heap Blocks: exact=36911
                            ->  Bitmap Index Scan on index_article_reports_on_site_id_and_article_id_and_date  (cost=0.00..20021.42 rows=405436 width=0) (actual time=125.846..125.846 rows=398479 loops=1)"
                                  Index Cond: (((site_id)::text = 'XYZ'::text) AND (date >= '2017-04-13'::date) AND (date <= '2017-06-28'::date))
  ->  Index Scan using articles_pkey on articles y  (cost=0.42..8.44 rows=1 width=128) (actual time=0.014..0.014 rows=1 loops=20)
       Index Cond: (id = q.article_id)
 Planning time: 1.443 ms
 Execution time: 12613.689 ms

Thanks in advance

like image 905
absg Avatar asked Jun 28 '17 17:06

absg


People also ask

Why is my query slow?

Slow queries are frequently caused by combining two or more large tables together using a JOIN. Review the number of joins in your query, and determine if the query is pulling more information than is actually needed.

Why is Entity Framework first load slow?

Entity Framework loads very slowly the first time because the first query EF compiles the model. If you are using EF 6.2, you can use a Model Cache which loads a prebuilt edmx when using code first; instead, EF generates it on startup.

How to improve the performance of a slow SQL query?

However, there are other ways that a user search can be done, using newer specialised technology. These suggestions for improving a slow SQL query can help in most situations. Knowing what your query is trying to do and avoiding some of the common problems that slow down SQL queries is a great way to improve the performance.

Why first query is slow in Entity Framework?

Entity Framework Why First Query is slow? Why Entity Framework First Load is Slow? Entity framework is very slow to load for the first time after every compilation especially when you have a large model. Entity Framework loads very slowly the first time because the first query EF compiles the model.

Why are my queries running so slowly?

If the database is doing a lot of work at the moment, or under a high load, then all queries including yours will run slowly. To check this, here are some queries you can start with (which is much easier than asking all of the developers). This was used from this page on JohnSansom.com. More information on the Cross Apply feature can be found here.

What can I do to make my query run faster?

There may be a few things you can change to get the query performing well. Let’s take a look. One of the first things to do is to check how busy the database is. If the database is doing a lot of work at the moment, or under a high load, then all queries including yours will run slowly.


2 Answers

There are two levels of "cache" that Postgres uses:

  • OS file cache
  • shared buffers.

Important: Postgres directly controls only the second one, and relies on the first one, which is under OS' control.

First thing I would check are these two settings in postgresql.conf:

  • effective_cache_size – usually I set it to ~3/4 of all RAM available. Notice that it's not a setting that tells Postgres how to allocate memory, it's just "an advice" to Postgres planner telling some estimate of OS file cache size
  • shared_buffers – usually I set it to 1/4 of RAM size. This is allocation setting.

Also, I'd check other memory-related settings (work_mem, maintenance_work_mem) to understand how much RAM might be consumed, so will my effective_cache_size estimation be correct at most times.

But if you just turned your Postgres on, the first queries will most probably be long because there is no data in OS file cache and in shared buffers. You can check it with advanced EXPLAIN options:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...

-- you will see how many buffers were fetched from disk ("read") or from cache ("hit")

Here you can find good material on using EXPLAIN: http://www.dalibo.org/_media/understanding_explain.pdf

Additionally, there is an extension aiming to solve "cold cache" problem: pg_prewarm https://www.postgresql.org/docs/current/static/pgprewarm.html

Also, working with SSD disks instead of magnetic ones will mean that disk reads will be much faster.

Have fun and well working Postgres :-)

like image 140
Nick Avatar answered Oct 01 '22 00:10

Nick


If it is the first query after inserting several rows you must run an

ANALYZE

in all the database or over the involved tables. Try executing it at database level.

like image 25
Emilio Platzer Avatar answered Oct 01 '22 00:10

Emilio Platzer