Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres 10.3: SELECT queries hang for hours

My application is using Postgres as DBMS, the version of Postgres that i'm using is 10.3 with the extension Postgis installed.

Occasionally i noticed that in random interval of times the dbms become slow and get stuck on a few SELECT queries.

From pg_stat_activity i noticed that the wait_event_type and wait_event of these queries is as follows:

 select wait_event_type, wait_event from pg_stat_activity where state='active'; 
 wait_event_type |  wait_event  
-----------------+--------------
 IO              | DataFileRead
 IO              | DataFileRead
 IO              | DataFileRead
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 LWLock          | buffer_io
 IO              | DataFileRead
 IO              | DataFileRead
                 | 
 IO              | DataFileRead
 LWLock          | buffer_io
 LWLock          | buffer_io
(33 rows)

My assumption, after checking the docs, is that the hardware underneath has some issues and then the problem i'm facing is not related to the application, or the type of query, but to the hardware itself.

Anybody ever faced this kind of issue?

like image 504
Riccardo Avatar asked Dec 14 '18 17:12

Riccardo


1 Answers

generic troubleshooting suggestions:

  • start gathering runtime statistics of the server - there's wide choice of tools - https://munin-monitoring.org/, https://grafana.com/ + influx db + telegraf, many more. regardless of the solution you should keep historical statistics of:

    • amount of disk operations done per second
    • latency of the disk storage [ regardless if it's spinning rust, ssd, nvme or network-attached ]
    • server CPU usage, load, memory usage
  • get also statistics about postgresql - https://www.percona.com/downloads/pmm2 might be helpful here

based on those stats - see if there's any build-up before problematic query happens.

occasional slow down might be caused by:

  • uneven performance of the storage subsystem [ ssd at the end of its life, patrol-read on the RAID array, hdd reallocating data due to bad sectors ]
  • incorrect index statistics leading to suboptimal query plan
  • overload of the system by incoming queries
  • overload of the system by other workloads running on the same hardware, noisy neighbors if you're running in a virtualized environment
like image 63
pQd Avatar answered Nov 08 '22 08:11

pQd