Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use shorter VARCHAR(n) fields?

It is frequently advised to choose database field sizes to be as narrow as possible. I am wondering to what degree this applies to SQL Server 2005 VARCHAR columns: Storing 10-letter English words in a VARCHAR(255) field will not take up more storage than in a VARCHAR(10) field.

Are there other reasons to restrict the size of VARCHAR fields to stick as closely as possible to the size of the data? I'm thinking of

  • Performance: Is there an advantage to using a smaller n when selecting, filtering and sorting on the data?
  • Memory, including on the application side (C++)?
  • Style/validation: How important do you consider restricting colunm size to force non-sensical data imports to fail (such as 200-character surnames)?
  • Anything else?

Background: I help data integrators with the design of data flows into a database-backed system. They have to use an API that restricts their choice of data types. For character data, only VARCHAR(n) with n <= 255 is available; CHAR, NCHAR, NVARCHAR and TEXT are not. We're trying to lay down some "good practices" rules, and the question has come up if there is a real detriment to using VARCHAR(255) even for data where real maximum sizes will never exceed 30 bytes or so.

Typical data volumes for one table are 1-10 Mio records with up to 150 attributes. Query performance (SELECT, with frequently extensive WHERE clauses) and application-side retrieval performance are paramount.

like image 742
chryss Avatar asked Jun 11 '10 14:06

chryss


People also ask

Does VARCHAR length affect performance?

Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed.

Should I specify length of VARCHAR?

Always specify a length to any text-based datatype such as NVARCHAR or VARCHAR . Don't over-use the MAX specification either as the resulting column then can't be indexed and comes with performance baggage.

What VARCHAR length should I use?

String values that vary significantly in length and are no longer than 8,000 bytes should be stored in a VARCHAR column. If you have huge strings (over 8,000 bytes), then VARCHAR(MAX) should be used. In order to store VARCHAR columns, the length information along with the data is stored.

What is difference between VARCHAR and NVARCHAR?

The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.


2 Answers

  1. Data Integrity - By far the most important reason. If you create a column called Surname that is 255 characters, you will likely get more than surnames. You'll get first name, last name, middle name. You'll get their favorite pet. You'll get "Alice in the Accounting Department with the Triangle hair". In short, you will make it easy for users to use the column as a notes/surname column. You want the cap to imped the users that try to put something other than a surname into that column. If you have a column that calls for a specific length (e.g. a US tax identifier is nine characters) but the column is varchar(255), other developers will wonder what is going on and you likely get crap data as well.

  2. Indexing and row limits. In SQL Server you have a limit of 8060 bytes IIRC. Lots of fat non-varchar(max) columns with lots of data can quickly exceed that limit. In addition, indexes have a 900 bytes cap in width IIRC. So, if you wanted to index on your surname column and some others that contain lots of data, you could exceed this limit.

  3. Reporting and external systems. As a report designer you must assume that if a column is declared with a max length of 255, it could have 255 characters. If the user can do it, they will do it. Thus, to say, "It probably won't have more than 30 characters." is not even remotely the same as "It cannot have more than 30 characters." Never rely on the former. As a report designer, you have to work around the possibilities that users will enter a bunch of data into a column. That either means truncating the values (and if that is the case why have the additional space available?) or using CanGrow to make a lovely mess of a report. Either way, you make it harder on other developers to understand the intent of the column if the column size is so far out of whack with the actual data being stored.

like image 156
Thomas Avatar answered Oct 01 '22 15:10

Thomas


I think that the biggest issue is data validation. If you allow 255 characters for a surname, you WILL get a surname that's 200+ characters in your database.

Another reason is that if you allow the database to hold 255 characters you now have to account for that possibility in every system that touches your database. For example, if you exported to a fixed-width column file all of your columns would have to be 255 characters wide, which could be pretty annoying or even problematic. That's just one example where it could cause a problem.

like image 21
Tom H Avatar answered Oct 01 '22 15:10

Tom H