Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Considerations when dropping columns in large tables

I have a table of call data that has grown to 1.3 billion rows and 173 gigabytes of data There are two columns that we no longer use, one is char(15) and the other is varchar(24). They have both been getting inserted with NULL for some time, I've been putting off removing the columns because I am unsure of the implications. We have limited space on both the drive with the database and the drive with the transaction log.

In addition I found this post saying the space would not be available until a DBCC REINDEX was done. I see this as both good and bad. It's good because dropping the columns should be very fast and not involve a lot of logging, but bad because the space will not be reclaimed. Will newly inserted records take up less space though? That would be fine in my case as we prune the old data after 18 months so the space will gradually decrease.

If we did a DBCC REINDEX (or ALTER INDEX REBUILD) would that actually help since the columns are not part of any index? Would that take up log space or lock the table so it could not be used?

like image 482
Jason Goemaat Avatar asked Feb 23 '11 00:02

Jason Goemaat


People also ask

When should you drop a column?

The DROP COLUMN command is used to delete a column in an existing table.

Does dropping a column lock the table?

Dropping a column does not lock the table for reading and does not wait on any reads to finish. Dropping columns will also attempt to remove files belonging to the column from all partitions, thus freeing up disk space immediately.

How can a column be dropped from a table structure?

It shows all column of a particular table. Right-click on the left-hand side of a column and you get option Delete Column. Click on it to delete a column.

What is the difference between dropping a column and setting a column as unused?

When you drop a column it moves into recycle bin while when you mark a column unused it is like logically dropping it but physically preserving it.


1 Answers

I found your question interesting, so decided to model it on a development database. SQL Server 2008, database size 400 Mb, log 2.4 Gb. I assume, from link provided you created a table with clustered index:

CREATE TABLE [dbo].[big_table](
    [recordID] [int] IDENTITY(1,1) NOT NULL,
    [col1] [varchar](50) NOT NULL,
    [col2] [char](15) NULL,
    [col3] [varchar](24) NULL,
 CONSTRAINT [PK_big_table] PRIMARY KEY CLUSTERED 
(
    [recordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] 

This table consist of 12 Million records.

sp_spaceused big_table, true

name-big_table, rows-12031303, reserved-399240 KB, data-397760 KB, index_size-1336 KB, unused-144 KB. 

drop columns

sp_spaceused big_table, true

Table size stays the same. Database and log size remained the same.

add 3 million of rows to the rest of the table

name-big_table, rows-15031303, reserved-511816 KB, data-509904 KB, index_size-1752 KB, unused-160 KB.

database size 500 Mb, log 3.27 Gb.

After

DBCC DBREINDEX( big_table )

Log is the same size, but database size increased to 866 Mb

name-big_table, rows-12031303, reserved-338376 KB, data-337704  KB, index_size-568 KB, unused-104 KB. 

Again add 3 million rows to see if they going into available space within database. Database size is the same, log 3.96 Gb, which clearly shows they are.

Hope it makes sense.

like image 106
Pavel Nefyodov Avatar answered Oct 27 '22 00:10

Pavel Nefyodov