varchar(255), varchar(256), nvarchar(255), nvarchar(256), nvarchar(max), etc?
256 seems like a nice, round, space-efficient number. But I've seen 255 used a lot. Why?
What's the difference between varchar and nvarchar?
In MS SQL Server (7.0 and up), varchar data is represented internally with up to three values:
The important part is that two-byte data length indicator. If it was one byte, you could only properly record strings of length 0 to 255; with two bytes, you can record strings of length 0 to something over 64000+ (specifically, 2^16 -1). However, the SQL Server page length is 8k, which is where that 8000+ character limit comes from. (There's data overflow stuff in SQL 2005, but if your strings are going to be that long you should just go with varchar(max).)
So, no matter how long you declare your varchar datatype column to be (15, 127, 511), what you will actually be storing for each and every row is:
Which gets me to my point: a number of older systems used only 1 byte to store the string length, and that limited you to a maximum length of 255 characters, which isn’t all that long. With 2 bytes, you have no such arbitrary limit... and so I recommend picking a number that makes sense to the (presumed non-technically oriented) user. , I like 50, 100, 250, 500, even 1000. Given that base of 8000+ bytes of storage, 255 or 256 is just as efficient as 200 or 250, and less efficient when it comes time to explain things to the end users.
This applies to single byte data (i.e. ansii, SQL_Latin1*_*General_CP1, et. al.). If you have to store data for multiple code pages or languages using different alphabets, you’ll need to work with the nvarchar data type (which I think works the same, two bytes for number of charactesr, but each actual character of data requires two bytes of storage). If you have strings likely to go over 8000, or over 4000 in nvarchar, you will need to use the [n]varchar(max) datatypes.
And if you want to know why it is so very important to take up space with extra bytes just to track how long the data is, check out http://www.joelonsoftware.com/articles/fog0000000319.html
Philip
There are a couple of other points to consider when defining char/varchar and the N variations.
First, there is some overhead to storing variable length strings in the database. A good general rule of thumb is to use CHAR for strings less than 10 chars long, since N/VARCHAR stores both the string and the length and the difference between storing short strings in N/CHAR vs. N/VARCHAR under 10 isn't worth the overhead of the string length.
Second, a table in SQL server is stored on 8KB pages, so the max size of the row of data is 8060 bytes (the other 192 are used for overhead by SQL). That's why SQL allows a max defined column of VARCHAR(8000) and NVARCHAR(4000). Now, you can use VARCHAR(MAX) and the unicode version. But there can be extra overhead associated with that.
If I'm not mistaken, SQL server will try to store the data on the same page as the rest of the row but, if you attempt to put too much data into a VARCHAR(Max) column, it will treat it as binary and store it on another page.
Another big difference between CHAR and VARCHAR has to do with page splits. Given that SQL Server stores data in 8KB pages, you could have any number of rows of data stored on a page. If you UPDATE a VARCHAR column with a value that is large enough that the row will no longer fit on the page, the server will split that page, moving off some number of records. If the database has no available pages and the database is set to auto grow, the server will first grow the database to allocate blank pages to it, then allocate blank pages to the table and finally split the single page into two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With