Stream with a lot of UPDATEs and PostgreSQL

Tags:

I'm quite a newbie with PostgreSQL optimization and chosing whatever's appropriate job for it and whatever's not. So, I want to know whenever I'm trying to use PostgreSQL for inappropriate job, or it is suitable for it and I should set everything up properly.

Anyway, I have a need for a database with a lot of data that changes frequently.

For example, imagine an ISP, having a lot of clients, each having a session (PPP/VPN/whatever), with two self-describing frequently updated properties bytes_received and bytes_sent. There is a table with them, where each session is represented by a row with unique ID:

CREATE TABLE sessions(
    id BIGSERIAL NOT NULL,
    username CHARACTER VARYING(32) NOT NULL,
    some_connection_data BYTEA NOT NULL,
    bytes_received BIGINT NOT NULL,
    bytes_sent BIGINT NOT NULL,
    CONSTRAINT sessions_pkey PRIMARY KEY (id)
)

And as accounting data flows, this table receives a lot of UPDATEs like those:

-- There are *lots* of such queries!
UPDATE sessions SET bytes_received = bytes_received + 53554,
                    bytes_sent = bytes_sent + 30676
                WHERE id = 42

When we receive a never ending stream with quite a lot (like 1-2 per second) of updates for a table with a lot (like several thousands) of sessions, probably thanks to MVCC, this makes PostgreSQL very busy. Are there any ways to speed everything up, or Postgres is just not exactly suitable for this task and I'd better consider it unsuitable for this job and put those counters to another storage like memcachedb, using Postgres only for fairly static data? But I'll miss an ability to infrequently query on this data, for example to find TOP10 downloaders, which is not really good.

Unfortunately, the amount of data cannot be lowered much. The ISP accounting example is all thought up to simplify the explanation. The real problem's with another system, which structure is somehow harder to explain.

Thanks for suggestions!

490

asked Nov 02 '09 18:11

drdaeman

1 Answers

The database really isn't the best tool for collecting lots of small updates, but as I don't know your queryability and ACID requirements I can't really recommend something else. If it's an acceptable approach the application side update aggregation suggested by zzzeek can help lower the update load significantly.

There is an similar approach that can give you durability and ability to query the fresher data at some performance cost. Create a buffer table that can collect the changes to the values that need to be updated and insert the changes there. At regular intervals in a transaction rename the table to something else and create a new table in place of it. Then in a transaction aggregate all the changes, do the corresponding updates to the main table and truncate the buffer table. This way if you need a consistent and fresh snapshot of any data you can select from the main table and join in all the changes from the active and renamed buffer tables.

However if neither is acceptable you can also tune the database to deal better with heavy update loads.

To optimize the updating make sure that PostgreSQL can use heap-only tuples to store the updated versions of the rows. To do this make sure that there are no indexes on the frequently updated columns and change the fillfactor to something lower from the default 100%. You'll need to figure out a suitable fill factor on your own as it depends heavily on the details of the workload and the machine it is running on. The fillfactor needs to be low enough that allmost all of the updates fit on the same database page before autovacuum has the chance to clean up the old non-visible versions. You can tune autovacuum settings to trade off between the density of the database and vacuum overhead. Also, take into account that any long transactions, including statistical queries, will hold onto tuples that have changed after the transaction has started. See the pg_stat_user_tables view to see what to tune, especially the relationship of n_tup_hot_upd to n_tup_upd and n_live_tup to n_dead_tup.

Heavy updating will also create a heavy write ahead log (WAL) load. Tuning the WAL behavior (docs for the settings) will help lower that. In particular, a higher checkpoint_segments number and higher checkpoint_timeout can lower your IO load significantly by allowing more updates to happen in memory. See the relationship of checkpoints_timed vs. checkpoints_req in pg_stat_bgwriter to see how many checkpoints happen because either limit is reached. Raising your shared_buffers so that the working set fits in memory will also help. Check buffers_checkpoint vs. buffers_clean + buffers_backend to see how many were written to satisfy checkpoint requirements vs. just running out of memory.

115

answered Sep 19 '22 22:09

Ants Aasma

Related questions
                            
                                How to add code to initialize sql database
                            
                                Rails STI association with subclasses
                            
                                Format function vs Parameters in sql injection scenarios?
                            
                                Convert varchar to time
                            
                                Counting distinct items within a category on SQL
                            
                                Duplicate all rows in a table and prevent duplicate keys
                            
                                What does 'vw' mean in this SQL statement?
                            
                                Select distinct rows whilst grouping by max value
                            
                                mySQL Query Remove Null Records in Column
                            
                                ORA-01408: such column list already indexed
                            
                                CASE statement in where clause in tsql query
                            
                                Export Vertica query result to csv file
                            
                                is_unique in codeigniter for edit function
                            
                                SQL Count multiple columns
                            
                                Rails - order on column's values (priority column)
                            
                                How to delete first few records from a table without any criteria in PostgreSQL?
                            
                                Select Records between Range from another table
                            
                                SQL ZOO List each continent and the name of the country that comes first alphabetically
                            
                                When should one use auto shrink on log files in SQL Server?
                            
                                Using Ref Cursor in Oracle SQL Developer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stream with a lot of UPDATEs and PostgreSQL

Tags:

sql

optimization

postgresql

sql-update

drdaeman

People also ask

1 Answers

Ants Aasma

Recent Activity

Donate For Us