Let's imagine that we're building GitHub and we have two tables: <code>repos</code> and <code>issues</code>. Every GitHub repo has a collection of issues, and so the <code>issues</code> table has a foreign key of <code>repo_id</code>. Now, when you're browsing a GitHub repo's issues, you don't want to be exposed to the internal <code>id</code>. Instead, you want something like a <code>number</code>, which increments from <code>1..n</code> for only that repository. You want your first issue in your new repo to be numbered <code>1</code>, not whatever the next <code>id</code> for an issue on GitHub is. Of course, you need a way to increment, and you want to make sure that the <code>number</code> is unique when scoped to the repo. And so you especially want to avoid then any race condition where the same number can be generated twice. What's the most straightforward way of handling this? A trigger? Something else entirely? I am using PostgreSQL but would prefer approaches that are vanilla SQL where possible, e.g. triggers. If there's a simpler Postgres approach, then that would also be useful. Any code that demonstrate your approach would be extraordinarily useful. Thanks!

Suppose you want to add a new <code>issue</code> to a certain <code>repo</code>, you could execute the following operations: <ol> <li>start a new explicit transaction;</li> <li>select the <code>repo</code> you want to modify with a <code>SELECT ... FOR UPDATE</code>. This will put a row-level lock on it and prevent other transactions that want to add a new <code>issue</code> for that <code>repo</code> to proceed concurrently;</li> <li>get the new issue number for that <code>repo</code> in some way (for instance you could have a <code>latest_issue</code> column in <code>issue</code>, as in one of the answers, or you could perform a query to find it);</li> <li>insert the new <code>issue</code> with the correct issue number;</li> <li>terminate the transaction: this will release the lock and allow other transactions that want to work on the same <code>repo</code> to continue.</li> </ol> So you could define a stored procedure in this way, and call it every time you want to insert a new <code>issue</code>. Under the hypothesis that there are not too many concurrent transactions trying to insert new issues for the same repository, this would prevent race conditions and still operate with a reasonable efficiency.

How can you auto-increment a non primary ID column scoped to another table?

Tags:

sql

postgresql

Let's imagine that we're building GitHub and we have two tables: repos and issues. Every GitHub repo has a collection of issues, and so the issues table has a foreign key of repo_id.

Now, when you're browsing a GitHub repo's issues, you don't want to be exposed to the internal id. Instead, you want something like a number, which increments from 1..n for only that repository. You want your first issue in your new repo to be numbered 1, not whatever the next id for an issue on GitHub is.

Of course, you need a way to increment, and you want to make sure that the number is unique when scoped to the repo. And so you especially want to avoid then any race condition where the same number can be generated twice.

What's the most straightforward way of handling this? A trigger? Something else entirely?

I am using PostgreSQL but would prefer approaches that are vanilla SQL where possible, e.g. triggers. If there's a simpler Postgres approach, then that would also be useful.

Any code that demonstrate your approach would be extraordinarily useful. Thanks!

312

asked Aug 17 '16 22:08

Josh Smith

2 Answers

I don't think there is a way to do this without the possibility of a race condition. This should minimize race conditions but not eliminate them. There may be better ways within specific database architectures. Assuming "REPOSITORY_ID" is provided by your application code:

insert into issues (repo_id,line_id) values (
    REPOSITORY_ID,
    coalesce((select max(line_id)+1 from issues where repo_id=REPOSITORY_ID),0)
);

This pulls the current highest line_id and increments it at the time of the insert. If there are no records, it defaults to 0. There is a small chance of a race if two inserts hit at the exact same time, but it seems unlikely. If you enforce uniqueness you can check for an error on insert and retry on failure.

117

answered Nov 13 '22 08:11

Ogre Codes

Suppose you want to add a new issue to a certain repo, you could execute the following operations:

start a new explicit transaction;
select the repo you want to modify with a SELECT ... FOR UPDATE. This will put a row-level lock on it and prevent other transactions that want to add a new issue for that repo to proceed concurrently;
get the new issue number for that repo in some way (for instance you could have a latest_issue column in issue, as in one of the answers, or you could perform a query to find it);
insert the new issue with the correct issue number;
terminate the transaction: this will release the lock and allow other transactions that want to work on the same repo to continue.

So you could define a stored procedure in this way, and call it every time you want to insert a new issue. Under the hypothesis that there are not too many concurrent transactions trying to insert new issues for the same repository, this would prevent race conditions and still operate with a reasonable efficiency.

answered Nov 13 '22 10:11

Renzo

Related questions
                            
                                Why "SELECT COUNT(DISTINCT <Column>) FROM <Table>" return 0?
                            
                                How to make a UNION with Doctrine?
                            
                                Get the top n results per group [duplicate]
                            
                                How can I make this query sargable?
                            
                                EntityFramework package version="6.1.3" and Web config version 6.0.0.0?
                            
                                MySQL grouping isn't respecting ORDER BY
                            
                                Combine query that relies on resultset of another
                            
                                @SqlResultSetMapping columns : entities with sub-entities
                            
                                SQL max() function with a where clause and group by does not use the index efficiently
                            
                                Connecting tables when querying in MySQL
                            
                                Can a WHERE clause predicate evaluate to NULL?
                            
                                Unescape a string with escaped newlines and carriage returns
                            
                                SQL Regex - Replace with substring from another field
                            
                                Rails - Order by the average of an association
                            
                                Nicer way to insert on multiple tables, multiple retuned values from SQL WITH statement
                            
                                How to find out when a column was changed in Oracle DB?
                            
                                pandas read_sql return query string with arguments passed
                            
                                UNION of non-nullable columns is nullable
                            
                                SQL Server 2008 Merge Statement Multiple Match Conditions
                            
                                Materialized view fast refresh with HAVING clause?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With