Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres - Update on single record slow on ver 10+

Tags:

sql

postgresql

We have a very simple table, DDL is as follows:

CREATE TABLE public.next_id (
  id varchar(255) NOT NULL,
  next_value int8 NOT NULL,
  CONSTRAINT next_id_pk PRIMARY KEY (id)
);

The table only has about 120 rows, and does not have any indexes apart from the primary key.

When I do the following UPDATE query in DBeaver to either of two Linux servers running either 10.5 or 11.2 of PostgreSQL, it takes approx 50ms:

update NEXT_ID set next_value=next_value+1 where id='Session';

However, if I point DBeaver at a server running 9.5.3 of PostgreSQL, it takes approx 3ms on average.

Now if I create a FOR loop as follows:

do $$
begin
    for i in 1..100000 loop
        update NEXT_ID set next_value=next_value+1 where id='Session';
    end loop;
end;
$$;

It takes approx the same time (~1.5s) on all of the machines. In other words, the margin of error is probably equivalent to the extra delay experienced with the one record update.

It feels like there is some sort of overhead involved with the transaction around the statements.

How can I get more information about where PostgreSQL is spending the time?

I have tried doing an EXPLAIN ANALYSE on the above single record UPDATE on the 'slower' servers, and I get the following:

Update on next_id  (cost=0.00..2.58 rows=1 width=36) (actual time=0.057..0.057 rows=0 loops=1)
  ->  Seq Scan on next_id  (cost=0.00..2.58 rows=1 width=36) (actual time=0.043..0.044 rows=1 loops=1)
        Filter: ((id)::text = 'Session'::text)
        Rows Removed by Filter: 125
Planning Time: 0.066 ms
Execution Time: 0.080 ms

Which seems to indicate that the query is actually only taking a couple of ms to plan and execute. So where is the rest of the time being taken?

All the servers involved are using the same database that has been replicated across all of them.

By the way, I'm not interested in anyone telling me that using a VARCHAR(255) for a primary key is a bad idea, as this is the same across all the servers, and it's not the point of this question.


UPDATE: We've noticed that a major difference between a Linux machine that is slow (mine) and another that is fast is the filesystems. My machine is using BTRFS for the filesystem where Postgres is, and the faster machine is using XFS.

A quick google of Postgres on various filesystems reveals a few people saying that using Postgres on BTRFS is (don't cross the streams!) bad.

We are going to attempt to reformat my machine to use XFS to see if that makes a difference.

In the meantime, I'm still interested in giving the bounty to anyone who can tell me how to log where the extra time is being spent.


UPDATE2: Following the suggestion by Nick Barnes in the comments, I ran a sequence of BEGIN; UPDATE ...; COMMIT; statements explicitly, and the log gave me the following output:

LOG:  duration: 0.025 ms  parse <unnamed>: begin
LOG:  duration: 0.014 ms  bind <unnamed>: begin
LOG:  duration: 0.003 ms  execute <unnamed>: begin
LOG:  duration: 0.045 ms  parse <unnamed>:  update NEXT_ID set next_value=next_value+1 where id='Session'
LOG:  duration: 0.055 ms  bind <unnamed>:  update NEXT_ID set next_value=next_value+1 where id='Session'
LOG:  duration: 0.059 ms  execute <unnamed>:  update NEXT_ID set next_value=next_value+1 where id='Session'
LOG:  duration: 0.004 ms  parse <unnamed>:  commit
LOG:  duration: 0.003 ms  bind <unnamed>:  commit
LOG:  duration: 50.237 ms  execute <unnamed>:  commit

So yes, Nick, the overhead is definitely in the COMMIT. But what is it doing? Is there any way to get more detailed information in the log about what it is doing during that 50ms?

like image 200
DuncanKinnear Avatar asked Apr 03 '19 04:04

DuncanKinnear


People also ask

How do I find the slowest queries in PostgreSQL?

PostgreSQL allows logging slow queries to a file, with a configured query duration threshold. Parsing the slow log with tools such as EverSQL Query Optimizer will allow you to quickly locate the most common and slowest SQL queries in the database.

Why is my PSQL 3 million records so slow?

If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records. Hi Tom, thanks. I am running a single update from the psql command line.

How many millions of changes does each run make in PostgreSQL?

Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions. Gives 1,500 updates/sec. which means each run would take at least 18 hours. HOT updates solution as described here with FILLFACTOR=50.

How do I change the duration of a PostgreSQL query?

Open the postgresql.conf file in your favorite text editor. Search for the line: #log_min_duration_statement = -1 * This setting will log all queries executing more than 1 second. you can change the value 1000 to the amount of milliseconds you want to set the threshold.


1 Answers

The UPDATE itself is fairly cheap; before the transaction is committed, there is no need for your new data to survive a crash, so the changes are only made to in-memory buffers (which the server flushes to disk at its leisure).

It's not until you commit the transaction that the server needs to give you a guarantee of durability. Postgres handles crash safety using a write-ahead log (WAL), and when you COMMIT, you're waiting for the WAL to be synchronously written to disk.

This makes the commit latency highly dependent on the file system and the underlying hardware, and if your PG10 instance is waiting around for BTRFS to finish a copy-on-write or something, that could certainly account for the difference you're seeing.

To confirm that this is the cause, you can skip the synchronous disk write by disabling fsync (though this puts you at risk of data corruption, so be sure to test it on a throwaway instance). A safer and less intrusive option is to SET LOCAL synchronous_commit = off at the start of your transaction, which should have the same effect, provided you don't have synchronous replication running.

like image 145
Nick Barnes Avatar answered Oct 17 '22 01:10

Nick Barnes