Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

overhead of varchar(max) columns with small data

As part of a bulk load of data from an external source the stageing table is defined with varchar(max) columns. The idea being that each column will be able to hold whatever it finds in the source CSV file, and that we'll validate the data (for type, size, percision etc) later.

But I'm concerned that the varchar(max) column has a lot of overhead for columns that have less than 200 characters. The fellow that designed this assures me this is best practice for ETL but I thought I would validate that assertion with the community.

like image 918
Ralph Shillington Avatar asked May 13 '09 14:05

Ralph Shillington


1 Answers

VARCHAR(MAX) column values will be stored IN the table row, space permitting. So if you have a single VARCHAR(MAX) field and it's 200, 300 byte, chances are it'll be stored inline with the rest of your data. No problem or additional overhead here.

Only when the entire data of a single row cannot fit on a single SQL Server page (8K) anymore, only then will SQL Server move VARCHAR(MAX) data into overflow pages.

So all in all, I think you get the best of both worlds - inline storage when possible, overflow storage when necessary.

Marc

PS: As Mitch points out, this default behaviour can be turned off - I don't see any compelling reasons to do so, however....

like image 93
marc_s Avatar answered Oct 03 '22 13:10

marc_s