Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select for Update using limit in Postgres

I need to distribute the processing on records to different machines.

On each of these machines(nodes), I run the following query every minutes:

select * 
from RECORDS_TO_PROCESS
limit MAX_PER_MACHINE_RUN
for update

Hypothetically, assume that I have 1000 records to process, 10 machines(nodes) and each machine wants to process 100 records(max) in a minute.

R1
R2
R3
...
R101
R102
R103
...
R201
R202
R203
...

All I want is for Records (R1-100) to get to query made from Node1, Records (R101-R200) to be returned from query made from Node2 etc.

Will this work? Do we think that this would be able to spread load?

Assume that the transaction isolation level is the default one (Read Committed)

like image 924
user965692 Avatar asked Jun 22 '26 10:06

user965692


1 Answers

That won't work because the first query to lock a row for update will block all subsequent queries attempting to lock that same row, until the first one commits. Version 9.5 will introduce SKIP LOCKED so that they will not block each other, but rather all get different rows.

Until 9.5 is out, you could use pg_try_advisory_lock in the WHERE clause to emulate the SKIP LOCKED feature. You would have to arrange for each row to have a unique(ish) identifier to feed to pg_try_advisory_lock.

But I think that this is a mistaken design anyway. You almost always want to first claim a row by writing some identifier (like node + pid) into a column and committing it, and then do a second transaction to mark it as done once it is done.

And if you are limiting it to 100 per minute, I don't see a reason to try to select 100 at a time. Just do one at a time, once every half second. the overhead should be tolerable, unless your database has a very high ping time.

like image 100
jjanes Avatar answered Jun 24 '26 12:06

jjanes