Increase PostgreSQL write speed at the cost of likely data loss?

Tags:

I love that PostgreSQL is crash resistant, as I don't want to spend time fixing a database. However, I'm sure there must be some things I can disable/modify so that inserts/updates will work faster even if I lose a couple records prior to a power-outage / crash. I'm not worried about a couple records - just the database as a whole.

I am trying to optimize PostgreSQL for large amounts of writes. It currently takes 22 minutes to insert 1 million rows which seems a bit slow.

How can I speed up PostgreSQL writes?

Some of the options I have looked into (like full_page_writes), seem to also run the risk of corrupting data which isn't something I want. I don't mind lost data - I just don't want corruption.

Update 1

Here is the table I am using - this since most of the tables will contain ints and small strings this "sample" table seems to be the best example of what I should expect.

CREATE TABLE "user" (   id serial NOT NULL,   username character varying(40),   email character varying(70),   website character varying(100),   created integer,   CONSTRAINT user_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX id ON "user" USING btree (id);

I have about 10 scripts each issuing 100,000 requests at a time using prepared statements. This is to simulate a real-life load my application will be giving the database. In my application each page has 1+ inserts.

Update 2

I am using asynchronous commits already, because I have

synchronous_commit = off

in the main configuration file.

896

asked Feb 27 '11 03:02

Xeoncross

2 Answers

1M records inserted in 22 minutes works out to be 758 records/second. Each INSERT here is an individual commit to disk, with both write-ahead log and database components to it eventually. Normally I expect that even good hardware with a battery-backed cache and everything you will be lucky to hit 3000 commit/second. So you're not actually doing too bad if this is regular hardware without such write acceleration. The normal limit here is in the 500 to 1000 commits/second range in the situation you're in, without special tuning for this situation.

As for what that would look like, if you can't make the commits include more records each, your options for speeding this up include:

Turn off synchronous_commit (already done)
Increase wal_writer_delay. When synchronous_commit is off, the database spools commits up to be written every 200ms. You can make that some number of seconds instead if you want to by tweaking this upwards, it just increases the size of data loss after a crash.
Increase wal_buffers to 16MB, just to make that whole operation more efficient.
Increase checkpoint_segments, to cut down on how often the regular data is written to disk. You probably want at least 64 here. Downsides are higher disk space use and longer recovery time after a crash.
Increase shared_buffers. The default here is tiny, typically 32MB. You have to increase how much UNIX shared memory the system has to allocate. Once that's done, useful values are typically >1/4 of total RAM, up to 8GB. The rate of gain here falls off above 256MB, the increase from the default to there can be really helpful though.

That's pretty much it. Anything else you touched that might help could potentially cause data corruption in a crash; these are all completely safe.

158

answered Sep 29 '22 03:09

Greg Smith

22 minutes for 1 million rows doesn't seem that slow, particularly if you have lots of indexes.

How are you doing the inserts? I take it you're using batch inserts, not one-row-per-transaction.

Does PG support some kind of bulk loading, like reading from a text file or supplying a stream of CSV data to it? If so, you'd probably be best advised to use that.

Please post the code you're using to load the 1M records, and people will advise.

Please post:

CREATE TABLE statement for the table you're loading into
Code you are using to load in
small example of the data (if possible)

EDIT: It seems the OP isn't interested in bulk-inserts, but is doing a performance test for many single-row inserts. I will assume that each insert is in its own transaction.

Consider batching the inserts on the client-side, per-node, writing them into a temporary file (hopefully durably / robustly) and having a daemon or some periodic process which asynchronously does a batch insert of outstanding records, in reasonable sized batches.
This per-device batching mechanism really does give the best performance, in my experience, in audit-data like data-warehouse applications where the data don't need to go into the database just now. It also gives the application resilience against the database being unavailable.
Of course you will normally have several endpoint devices creating audit-records (for example, telephone switches, mail relays, web application servers), each must have its own instance of this mechanism which is fully independent.
This is a really "clever" optimisation which introduces a lot of complexity into the app design and has a lot of places where bugs could happen. Do not implement it unless you are really sure you need it.

answered Sep 29 '22 05:09

MarkR

Related questions
                            
                                Unit-Testing Databases
                            
                                Should you make a self-referencing table column a foreign key?
                            
                                How to parse the data from Google Alerts?
                            
                                Multiple and single indexes
                            
                                How to get the raw 'created_at' value in the database (not an object cast to an ActiveSupport::TimeWithZone)
                            
                                How do you join two tables on a foreign key field using django ORM?
                            
                                DataGrip added value compared to IntelliJ IDEA
                            
                                How to execute an Oracle stored procedure via a database link
                            
                                MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate]
                            
                                postgresql index on string column
                            
                                Querying oracle clob column
                            
                                Spring data : CrudRepository's save method and update
                            
                                One table or many? [closed]
                            
                                Codeigniter - Using Multiple Databases
                            
                                What's a real-world example of ACID? [closed]
                            
                                designing database to hold different metadata information
                            
                                How to call the Scan variadic function using reflection
                            
                                How to connect in java as SYS to Oracle?
                            
                                Group by multiple columns in Laravel
                            
                                In a join table, what's the best workaround for Rails' absence of a composite key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Increase PostgreSQL write speed at the cost of likely data loss?

Tags:

database

postgresql