Postgres concurrency and serializability. Do I need a SERIALIZABLE isolation level?

Tags:

I have an Items and Jobs table:

Items

id = PK
job_id = Jobs FK
status = IN_PROGRESS | COMPLETE

Jobs

id = PK

Items start out as IN_PROGRESS, but work is performed on them, and handed off to a worker to update. I have an updater process that is updating Items as they come in, with a new status. The approach I have been doing so far has been (in pseudocode):

def work(item: Item) = {
  insideTransaction {
    updateItemWithNewStatus(item)
    jobs, items = getParentJobAndAllItems(item)
    newJobStatus = computeParentJobStatus(jobs, items)
    // do some stuff depending on newJobStatus
  }
}

Does that make sense? I want this to work in a concurrent environment. The issue I have right now, is that COMPLETE is arrived at multiple times for a job, when I only want to do logic on COMPLETE, once.

If I change my transaction level to SERIALIZABLE, I do get the "ERROR: could not serialize access due to read/write dependencies among transactions" error as described.

So my questions are:

Do I need SERIALIZABLE?
Can I get away with SELECT FOR UPDATE, and where?
Can someone explain to me what is happening, and why?

Edit: I have reopened this question because I was not satisfied with the previous answers explanation. Is anyone able to explain this for me? Specifically, I want some example queries for that pseudocode.

620

asked Feb 15 '17 06:02

Dominic Bou-Samra

1 Answers

If you want the jobs to be able to run concurrently, neither SERIALIZABLE nor SELECT FOR UPDATE will work directly.

If you lock the row using SELECT FOR UPDATE, then another process will simply block when it executes the SELECT FOR UPDATE until the first process commits the transaction.

If you do SERIALIZABLE, both processes could run concurrently (processing the same row) but at least one should be expected to fail by the time it does a COMMIT since the database will detect the conflict. Also SERIALIZABLE might fail if it conflicts with any other queries going on in the database at the same time which affect related rows. The real reason to use SERIALIZABLE is precisely if you are trying to protect against concurrent database updates made by other jobs, as opposed to blocking the same job from executing twice.

Note there are tricks to make SELECT FOR UPDATE skip locked rows. If you do that then you can have actual concurrency. See Select unlocked row in Postgresql.

Another approach I see more often is to change your "status" column to have a 3rd temporary state which is used while a job is being processed. Typically one would have states like 'PENDING', 'IN_PROGRESS', 'COMPLETE'. When your process searches for work to do, it finds a 'PENDING' jobs, immediately moves it to 'IN_PROGRESS' and commits the transaction, then carries on with the work and finally moves it to 'COMPLETE'. The disadvantage is that if the process dies while processing a job, it will be left in 'IN_PROGRESS' indefinitely.

184

answered Sep 20 '22 06:09

zeroimpl

Related questions
                            
                                SQL Server Unique Index across tables
                            
                                How do you use T-SQL Full-Text Search to get results like Google?
                            
                                How to fix TF246017 The Team foundation server could not connect to database
                            
                                Django - SQL bulk get_or_create possible?
                            
                                Choose SQL as default cell magics for Jupyter Notebook
                            
                                BCP Utility Unable to Export Data in Linux Using JAVA:
                            
                                Set any param in PreparedStatement not working
                            
                                Distribution of table in time
                            
                                Is there a standard approach to generating sql dynamically?
                            
                                How to prevent SQL injection in MySQL's command-line shell interface?
                            
                                How to delete a large record from SQL Server?
                            
                                How can I structure a query to give me only the rows that match ALL values in a CSV list of IDs in T-SQL
                            
                                Can you use SQL to SELECT values from a JSON array?
                            
                                Justify text in SQL Reporting Services
                            
                                union clause in sql
                            
                                Can't create an index catalog in localdb v\11.0
                            
                                Database design for recurring events with exceptions
                            
                                Regression analysis in MySQL
                            
                                Dapper multi insert returning inserted objects
                            
                                Insert into a table and set another column to autoincremented column value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgres concurrency and serializability. Do I need a SERIALIZABLE isolation level?

Tags:

sql

postgresql

transactions

Dominic Bou-Samra

People also ask

1 Answers

zeroimpl

Recent Activity

Donate For Us