Postgres 10.3: SELECT queries hang for hours

Question

My application is using Postgres as DBMS, the version of Postgres that i'm using is 10.3 with the extension Postgis installed.

Occasionally i noticed that in random interval of times the dbms become slow and get stuck on a few SELECT queries.

From pg_stat_activity i noticed that the wait_event_type and wait_event of these queries is as follows:

 select wait_event_type, wait_event from pg_stat_activity where state='active'; 
 wait_event_type |  wait_event  
-----------------+--------------
 IO              | DataFileRead
 IO              | DataFileRead
 IO              | DataFileRead
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 IO              | DataFileRead
                 | 
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
(33 rows)

My assumption, after checking the docs, is that the hardware underneath has some issues and then the problem i'm facing is not related to the application, or the type of query, but to the hardware itself.

Anybody ever faced this kind of issue?

pQd · Accepted Answer

generic troubleshooting suggestions:

start gathering runtime statistics of the server - there's wide choice of tools - https://munin-monitoring.org/, https://grafana.com/ + influx db + telegraf, many more. regardless of the solution you should keep historical statistics of:
- amount of disk operations done per second
- latency of the disk storage [ regardless if it's spinning rust, ssd, nvme or network-attached ]
- server CPU usage, load, memory usage
get also statistics about postgresql - https://www.percona.com/downloads/pmm2 might be helpful here

based on those stats - see if there's any build-up before problematic query happens.

occasional slow down might be caused by:

uneven performance of the storage subsystem [ ssd at the end of its life, patrol-read on the RAID array, hdd reallocating data due to bad sectors ]
incorrect index statistics leading to suboptimal query plan
overload of the system by incoming queries
overload of the system by other workloads running on the same hardware, noisy neighbors if you're running in a virtualized environment

Postgres 10.3: SELECT queries hang for hours

Tags:

postgresql

postgresql-10

Riccardo

1 Answers

pQd

Recent Activity

Donate For Us

Postgres 10.3: SELECT queries hang for hours

Tags:

postgresql

postgresql-10

Riccardo

1 Answers

pQd

Related questions

Recent Activity

Donate For Us