Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use an inline varchar(max) column or store it in a separate table?

I want to create a table in MS SQL Server 2005 to record details of certain system operations. As you can see from the table design below, every column apart from Details is is non nullable.

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[Details] [varchar](max) NULL
)

Because the Details column won't always have data in it. Is it more efficient to store this column in a separate table and provide a link to it instead?

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[DetailID] [int] NULL
)       

CREATE TABLE [Detail]
(
[DetailID] [int] IDENTITY(1,1) NOT NULL,
[Details] [varchar](max) NOT NULL
)

For a smaller data type I wouldn't really consider it, but for a varchar(max) does doing this help keep the table size smaller? Or I am just trying to out smart the database and achieving nothing?

like image 756
Peter Bridger Avatar asked Nov 09 '09 15:11

Peter Bridger


People also ask

Should you always use VARCHAR Max?

CHAR, VARCHAR, and VARCHAR(MAX) CHAR columns should be used for columns that vary little in length. String values that vary significantly in length and are no longer than 8,000 bytes should be stored in a VARCHAR column. If you have huge strings (over 8,000 bytes), then VARCHAR(MAX) should be used.

Does VARCHAR Max affect performance?

Yes, it can affect performance, as space needs to be allocated to hold large values in the query engine. In your case, you could also use a suitably large size such as varchar(50) which would easily hold whatever you needed.

Does VARCHAR max waste space?

A varchar will only use as much space as inputted as opposed to a char which will pad with white space. Traffic size on a varchar column, therefore, is considerably smaller than a char column. Show activity on this post. Correct, it will not make any difference.

Is VARCHAR Max same as VARCHAR 8000?

About varchar(MAX)If your data is longer than 8000 characters varchar(MAX) is what you need. You can store up to 2GB size of data this way. In varchar(MAX) fields if your data size is shorter than 8000 characters your data is stored in row automatically (therefore the data execution is faster).


2 Answers

Keep it inline. Under the covers SQL Server already stores the MAX columns in a separate 'allocation unit' since SQL 2005. See Table and Index Organization. This in effect is exactly the same as keeping the MAX column in its own table, but w/o any disadvantage of explicitly doing so.

Having an explicit table would actually be both slower (because of the foreign key constraint) and consume more space (because of the DetaiID duplication). Not to mention that it requires more code, and bugs are introduced by... writing code.

alt text http://i.msdn.microsoft.com/ms189051.3be61595-d405-4b30-9794-755842d7db7e(en-us,SQL.100).gif

Update

To check the actual location of data, a simple test can show it:

use tempdb;
go

create table a (
  id int identity(1,1) not null primary key,
  v_a varchar(8000),
  nv_a nvarchar(4000),
  m_a varchar(max),
  nm_a nvarchar(max),
  t text,
  nt ntext);
go

insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('v_a', N'nv_a', 'm_a', N'nm_a', 't', N'nt');
go

select %%physloc%%,* from a
go

The %%physloc%% pseudo column will show the actual physical location of the row, in my case it was page 200:

dbcc traceon(3604)
dbcc page(2,1, 200, 3)

Slot 0 Column 2 Offset 0x19 Length 3 Length (physical) 3
v_a = v_a                            
Slot 0 Column 3 Offset 0x1c Length 8 Length (physical) 8
nv_a = nv_a                          
m_a = [BLOB Inline Data] Slot 0 Column 4 Offset 0x24 Length 3 Length (physical) 3
m_a = 0x6d5f61                       
nm_a = [BLOB Inline Data] Slot 0 Column 5 Offset 0x27 Length 8 Length (physical) 8
nm_a = 0x6e006d005f006100            
t = [Textpointer] Slot 0 Column 6 Offset 0x2f Length 16 Length (physical) 16
TextTimeStamp = 131137536            RowId = (1:182:0)                    
nt = [Textpointer] Slot 0 Column 7 Offset 0x3f Length 16 Length (physical) 16
TextTimeStamp = 131203072            RowId = (1:182:1)   

All column values but the TEXT and NTEXT were stored inline, including the MAX types.
After changing the table options and insert a new row (sp_tableoption does not affect existing rows), the MAX types were evicted into their own storage:

sp_tableoption 'a' , 'large value types out of row', '1';
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('2v_a', N'2nv_a', '2m_a', N'2nm_a', '2t', N'2nt');    
dbcc page(2,1, 200, 3);

Note how m_a and nm_a columns are now a Textpointer into the LOB allocation unit:

Slot 1 Column 2 Offset 0x19 Length 4 Length (physical) 4
v_a = 2v_a                           
Slot 1 Column 3 Offset 0x1d Length 10 Length (physical) 10
nv_a = 2nv_a                         
m_a = [Textpointer] Slot 1 Column 4 Offset 0x27 Length 16 Length (physical) 16
TextTimeStamp = 131268608            RowId = (1:182:2)                    
nm_a = [Textpointer] Slot 1 Column 5 Offset 0x37 Length 16 Length (physical) 16
TextTimeStamp = 131334144            RowId = (1:182:3)                    
t = [Textpointer] Slot 1 Column 6 Offset 0x47 Length 16 Length (physical) 16
TextTimeStamp = 131399680            RowId = (1:182:4)                    
nt = [Textpointer] Slot 1 Column 7 Offset 0x57 Length 16 Length (physical) 16
TextTimeStamp = 131465216            RowId = (1:182:5)                    

For completion sakeness we can also force the one of the non-max fields out of row:

update a set v_a = replicate('X', 8000);
dbcc page(2,1, 200, 3);

Note how the v_a column is stored in the Row-Overflow storage:

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
v_a = [BLOB Inline Root] Slot 0 Column 2 Offset 0x19 Length 24 Length (physical) 24
Level = 0                            Unused = 99                          UpdateSeq = 1
TimeStamp = 1098383360               
Link 0
Size = 8000                          RowId = (1:176:0) 

So, as other have already commented, the MAX types are stored inline by default, if they fit. For many DW projects this would be unnacceptable because the typical DW loads must scan or at least range scan, so the sp_tableoption ..., 'large value types out of row', '1' should be used. Note that this does not affect existing rows, in my test not even on index rebuild, so the option has to be turned on early.

For most OLTP type loads though the fact that MAX types are stored inline if possible is actually an advantage, since the OLTP access pattern is to seek and the row width makes little impact on it.

None the less, regarding the original question: separate table is not necessary. Turning on the large value types out of row option achieves the same result at a free cost for development/test.

like image 174
Remus Rusanu Avatar answered Sep 16 '22 18:09

Remus Rusanu


Paradoxically, if your data is normally less than 8000 characters, I would store it in a separate table, while if the data is greater than 8000 characters, I would keep it in the same table.

This is because what happens is that SQL Server keeps the data in the page if it allows the row to sit in single page, but when the data gets larger, it moves it out just like the TEXT data type and leaves just a pointer in the row. So for a bunch of 3000 character rows, you are fitting less rows per page, which is really inefficient, but for a bunch of 12000 character rows, the data is out of the row, so it's actually more efficient.

Having said this, typically you have a wide ranging mix of lengths and thus I would move it into its own table. This gives you flexibility for moving this table to a different file group etc.

Note that you can also specify it to force the data out of the row using the sp_tableoption. varchar(max) is basically similar to the TEXT data type with it defaulting to data in row (for varchar(max)) instead of defaulting to data out of row (for TEXT).

like image 44
Cade Roux Avatar answered Sep 17 '22 18:09

Cade Roux