Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you add a NOT NULL Column to a large table in SQL Server?

Tags:

sql-server

To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:

Assumptions:

  1. The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
  2. The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
  3. The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.

Possible solutions:

  1. Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
  2. Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
  3. Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.

Questions:

  1. Are my assumptions correct?
  2. Are these my only solutions? If so, which one is the best? I f not, what else could I do?
like image 509
MrB Avatar asked Nov 13 '08 19:11

MrB


People also ask

Can we add a NOT NULL column to an existing table?

You can add a not null column at the time of table creation or you can use it for an existing table. In the above table, we have declared Id as int type that does not take NULL value. If you insert NULL value, you will get an error.

How do I add a NOT NULL column in SQL Server?

To enforce NOT NULL for a column in SQL Server, use the ALTER TABLE .. ALTER COLUMN command and restate the column definition, adding the NOT NULL attribute.

How do I add a nullable column to an existing table in SQL?

ALTER TABLE SomeTable ADD SomeCol Bit NULL --Or NOT NULL. CONSTRAINT D_SomeTable_SomeCol --When Omitted a Default-Constraint Name is autogenerated. DEFAULT (0)--Optional Default-Constraint. WITH VALUES --Add if Column is Nullable and you want the Default Value for Existing Records.


2 Answers

I ran into this problem for my work also. And my solution is along #2.

Here are my steps (I am using SQL Server 2005):

1) Add the column to the table with a default value:

ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('') 

2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:

ALTER TABLE MyTable WITH NOCHECK ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL) 

3) Update the values incrementally in table:

GO UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL GO 1000 
  • The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.

  • GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.

like image 179
DHornpout Avatar answered Sep 20 '22 12:09

DHornpout


Here's what I would try:

  • Do a full backup of the database.
  • Add the new column, allowing nulls - don't set a default.
  • Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
  • The SQL is: ALTER DATABASE XXX SET RECOVERY SIMPLE
  • Run the update in batches as you discussed above, committing after each one.
  • Reset the new column to no longer allow nulls.
  • Go back to the normal FULL recovery.
  • The SQL is: ALTER DATABASE XXX SET RECOVERY FULL
  • Backup the database again.

The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.

like image 29
HTTP 410 Avatar answered Sep 17 '22 12:09

HTTP 410