Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?

Tags:

sql

postgresql

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?

Using several different technologies to connect to a postgresql database. First request might take 1.5 seconds. Exact same query will take .03 seconds the second time. Open a second instance of my application (connecting to same database) and that first request takes 1.5 seconds and the second .03 seconds.

Because of the different technologies we are using they are connecting at different points and using different connection methods so I really don't think it has anything to do with any code I have written.

I'm thinking that opening a connection doesn't do 'everything' until the first request, so that request has some overhead.

Because I have used the database, and kept the server up everything is in memory so index and the like should not be an issue.

Edit Explain - tells me about the query and honestly the query looks pretty good (indexed, etc). I really think postgresql has some kind of overhead on the first query of a new connection.

I don't know how to prove/disprove that. If I used PG Admin III (pgAdmin version 1.12.3) all the query's seem fast. Any of the other tools I have the first query is slow. Most the time its not noticeably slower, and if it was I always chalked it up to updating the ram with the index. But this is clearly NOT that. If I open my tool(s) and do any other query that returns results the second query is fast regardless. If the first query doesn't return results then the second is still slow, then third is fast.

edit 2 Even though I don't think the query has anything to do with the delay (every first query is slow) here are two results from running Explain (EXPLAIN ANALYZE)

 EXPLAIN ANALYZE 
 select * from company
 where company_id = 39

Output:

"Seq Scan on company  (cost=0.00..1.26 rows=1 width=54) (actual time=0.037..0.039 rows=1 loops=1)"
"  Filter: (company_id = 39)"
"Total runtime: 0.085 ms"

and:

EXPLAIN ANALYZE
select * from group_devices
where device_name ilike 'html5_demo'
and group_id in ( select group_id from manager_groups
where company_id in (select company_id from company where company_name ='TRUTHPT'))

output:

"Nested Loop Semi Join  (cost=1.26..45.12 rows=1 width=115) (actual time=1.947..2.457 rows=1 loops=1)"
"  Join Filter: (group_devices.group_id = manager_groups.group_id)"
"  ->  Seq Scan on group_devices  (cost=0.00..38.00 rows=1 width=115) (actual time=0.261..0.768 rows=1 loops=1)"
"        Filter: ((device_name)::text ~~* 'html5_demo'::text)"
"  ->  Hash Semi Join  (cost=1.26..7.09 rows=9 width=4) (actual time=0.297..1.596 rows=46 loops=1)"
"        Hash Cond: (manager_groups.company_id = company.company_id)"
"        ->  Seq Scan on manager_groups  (cost=0.00..5.53 rows=509 width=8) (actual time=0.003..0.676 rows=469 loops=1)"
"        ->  Hash  (cost=1.26..1.26 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"              ->  Seq Scan on company  (cost=0.00..1.26 rows=1 width=4) (actual time=0.025..0.027 rows=1 loops=1)"
"                    Filter: ((company_name)::text = 'TRUTHPT'::text)"
"Total runtime: 2.566 ms"
like image 314
Brian Hanf Avatar asked Jun 05 '14 16:06

Brian Hanf


People also ask

Why is my Postgres database so slow?

Disk Access. PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.

Which join is faster in PostgreSQL?

Nested loop joins are particularly efficient if the outer relation is small, because then the inner loop won't be executed too often. It is the typical join strategy used in OLTP workloads with a normalized data model, where it is highly efficient.

How many queries can Postgres handle per second?

In terms of business transactions, each business transactions is around 30-35 queries hitting the database. We are able to achieve ~ 150 business transactions with 4,500-5,000 QPS ( query per second ).


1 Answers

I have observed the same behavior. If I start a new connection, and run a query multiple times, the first execution is about 25% slower than the following executions. (This query has been run earlier in other connections, and I have verified that there is no disk I/O involved.) I profiled the process with perf during the first query execution, and this is what I found:

enter image description here

As you can see, a lot of time is spent handling page faults. If I profile the second execution, there are no page faults. AFAICT, these are what is called minor/soft page faults. This happens when a process for the first time access a page that is in shared memory. At that point, the process needs to map the page into its virtual address space (see https://en.wikipedia.org/wiki/Page_fault). If the page needs to be read from disk, it is called as major/hard page fault.

This explanation also fits with other observations that I have made: If I later run a different query in the same connection, the amount of overhead for its first execution seems to depend on how much overlap there is with the data accessed by the first query.

like image 125
Øystein Grøvlen Avatar answered Sep 19 '22 08:09

Øystein Grøvlen