I have one procedure which updates record values, and i want to fire it up against all records in table (over 30k records), procedure execution time is from 2 up to 10 seconds, because it depends on network load.
Now i'm doing UPDATE table SET field = procedure_name(paramns); but with that amount of records it takes up to 40 min to process all table.
Now im using 4 different connections witch fork to background and fires query with WHERE clause set to iterate over modulo of row id's to speed this up, ( WHERE id_field % 4 = ) and this works well and cuts down table populate to ~10 mins.
But i want to avoid using cron, shell jobs and multiple connections for this, i know that it can be done with libpq, but is there a way to fire up a query (4 different non-blocking queries) and do not wait till it ends execution, within single connection?
Or if anyone can point me out to some clues on how to write that function, using postgres internals, or simply in C and bound it as a stored procedure?
Cheers Darius
It is possible to update multiple rows at once. Below an example in postgres:
UPDATE
table_name
SET
column_name = temp.column_name
FROM
(VALUES
(<id1>, <value1>),
(<id2>, <value2>),
(<id3>, <value3>)
) AS temp("id", "column_name")
WHERE
table_name.id = temp.id
I've got a sure answer for this question - IF you will share with us what your ab workout is!!! I'm getting fat by the minute and I need answers myself...
OK I'll answer anyway.
If you are updating one table, on one database server, in 40 minutes 'single threaded' and in 10 minutes with 4 threads, the bottleneck is not the database server; otherwise, it would get bogged down in I/O. If you are executing a bunch of UPDATES, one call per record, the network round-trip time is killing you.
I'm pretty sure this is the case and not that it's either an I/O bottleneck on the DB or the possibility that procedure_name(paramns); is taking a long time. (If that were the procedure taking 2-10 seconds it would take like 2500 min to do 30K records). The reason I am sure is that starting 4 concurrent processed cuts the time in 1/4. So especially it is not an i/o issue on the DB server.
This might be the one excuse for putting business logic in an SP on the server. Optimization unfortunately means breaking the rules. The consequence is difficult maintenance. but, duh!!
However, the best solution would be to get this set up to use 'bulk update' queries. That might mean you have to take several strange and unintuitive steps such as this:
I'm not making any guarantees, especially as I look down at my ever-fattening belly, but, this has the potential to melt your update job down to under a minute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With