Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Altering MySQL table column type from INT to BIGINT

I have a table with just under 50 million rows. It hit the limit for INT (2147483647). At the moment the table is not being written to.

I am planning on changing the ID column from INT to BIGINT. I am using a Rails migration to do this with the following migration:

  def up
    execute('ALTER TABLE table_name MODIFY COLUMN id BIGINT(8) NOT NULL AUTO_INCREMENT')
  end

I have tested this locally on a dataset of 2000 rows and it worked ok. Running the ALTER TABLE command across the 50 million should be ok since the table is not being used at the moment?

I wanted to check before I run the migration. Any input would be appreciated, thanks!

like image 488
Arthur Avatar asked Jul 02 '19 01:07

Arthur


1 Answers

We had exactly same scenario but with postgresql, and i know how 50M fills up the whole range of int, its gaps in the ids, gaps generated by deleting rows over time or other factors involving incomplete transactions etc.

I will explain what we ended up doing, but first, seriously, testing a data migration for 50M rows on 2k rows is not a good test.

There can be multiple solutions to this problem, depending on the factors like which DB provider are you using? We were using mazon RDS and it has limits on runtime and what they call IOPS(input/output operations) if we run such intensive query on a DB with such limits it will run out of its IOPS quota mid way throuh, and when IOPS quota runs out, DB ends up being too slow and kind of just useless. We had to cancel our query, and let the IOPS catch up which takes about 30 minutes to 1 hour.

If you have no such restrictions and have DB on premises or something like that, then there is another factor, which is, if you can afford downtime?**

If you can afford downtime and have no IOPS type restriction on your DB, you can run this query directly, which will take a lot fo time(may half hour or so, depending on a lot of factors) and in the meantime

  • Table will be locked, as rows are being changed, so make sure not only this table is not getting any writes, but also no reads during the process, to make sure your process goes to the end smoothly without any deadlocks type situation.

What we did avoiding downtimes and the Amazon RDS IOPS limits:

In my case, we had still about 40M ids left in the table when we realized this is going to run out, and we wanted to avoid downtimes. So we took a multi step approach:

  1. Create a new big_int column, name it new_id or something(have it unique indexed from start), this will be nullable with default null.
  2. Write background jobs which runs each night a few times and backfills the new_id column from id column. We were backfilling about 4-5M rows each night, and a lot more over weekends(as our app had no traffic on weekends).
  3. When you are caught up backfilling, now we will have to stop any access to this table(we just took down our app for a few minutes at night), and create a new sequence starting from the max(new_id) value, or use existing sequence and bind it to the new_id column with default value to nextval of that sequence.
  4. Now switch primary key from id to new_id, before that make new_id not null.
  5. Delete id column.
  6. Rename new_id to id.
  7. And resume your DB operations.

This above is minimal writeup of what we did, you can google up some nice articles about it, one is this. This approach is not new and pretty much common, so i am sure you will find even mysql specific ones too, or you can just adjust a couple of things in this above article and you should be good to go.

like image 53
Zia Ul Rehman Mughal Avatar answered Sep 27 '22 16:09

Zia Ul Rehman Mughal