Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to quickly mass update sequential numbers in postgres

boss wants sequential order numbers for each merchant, starting at 1000.

Right now I'm looping through each merchant (using ruby), and updating the orders like this:

#running all of this in a migration
add_column :orders, :order_seq, :integer


Merchant.find_each do |merchant|
  order_seq = 999
  merchant.orders.order(:ordered_at).find_each do |order|
    order.update_column(:order_seq, order_seq+=1)
  end
end

I was planning to run this during a migration to set all the existing orders to have the sequential numbers populated according to their ordered_at date. I tested this on a fork of the production database and it will take average 80 ms per order update. With close to a million order records, this will incur far too much downtime.

Is there a faster way to do this with native postgres? This would be a one time migration that needs to be run once and there is nothing else concurrently going on.

I'm not a postgres expert, but is there a way to use a window function using 999+row_number() over each merchant_id and save that row_number back into order_seq column?

EDIT:

Using @Gorden-Linoff answer, but slightly modified. I realized I didn't need to use partition over merchant_id because there were only some active merchants that needed this, not the entire table. In addition the update needed to be on the orders table, not the merchants table, and the where clause can just use id not merchant_id and ordered_at.

Final solution:

  Merchant.active.find_each(batch_size: 100) do |merchant|
    statement = "update orders set order_seq = o.seqnum + 999 " +
      "from (select o.id, row_number() " +
      " over (order by ordered_at) as seqnum from orders o where o.merchant_id = #{merchant.id}" +
      ") o where orders.id = o.id"
    ActiveRecord::Base.connection.execute(statement)
  end

The result is that this operation takes 10 minutes to process 200 merchants. The old method processed about 10 merchants in 1 hour.

like image 719
Homan Avatar asked Jul 15 '14 18:07

Homan


People also ask

How do I update multiple columns in PostgreSQL?

It is very easy to update multiple columns in PostgreSQL. Here is the syntax to update multiple columns in PostgreSQL. UPDATE table_name SET column1 = value1, column2 = value2, ... [WHERE condition];

How do I change the sequence of numbers in SQL?

Sequences are integer values and can be of any data type that returns an integer. The data type cannot be changed by using the ALTER SEQUENCE statement. To change the data type, drop and create the sequence object.

Can Postgres handle 100 million rows?

PostgreSQL does not impose a limit on the number of rows in any table. There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns.


1 Answers

I think you can do this with native Postgres using an updatable subquery:

update merchants
    set order_seq = m.seqnum + 999
    from (select m.*, row_number() over (order by ordered_at) as seqnum
          from merchants m
         ) m
    where merchants.merchant_id = m.merchant_id and
          merchants.ordered_at = m.ordered_at;

EDIT:

If you want it to start over for each merchant id, then just use partition by:

update merchants
    set order_seq = m.seqnum + 999
    from (select m.*, row_number() over (partition by merchant_id
                                         order by ordered_at
                                        ) as seqnum
          from merchants m
         ) m
    where merchants.merchant_id = m.merchant_id and
          merchants.ordered_at = m.ordered_at;
like image 173
Gordon Linoff Avatar answered Sep 28 '22 15:09

Gordon Linoff