Is there a rule when we must use the Unicode types?
I have seen that most of the European languages (German, Italian, English, ...) are fine in the same database in VARCHAR columns.
I am looking for something like:
What about the collation of the server/database?
I don't want to use always NVARCHAR like suggested here What are the main performance differences between varchar and nvarchar SQL Server data types?
The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.
When to use what? If your column will store a fixed-length Unicode characters like French, Arabic and so on characters then go for NCHAR. If the data stored in a column is Unicode and can vary in length, then go for NVARCHAR.
Use nvarchar when the sizes of the column data entries vary considerably. Use nvarchar(max) when the sizes of the column data entries vary considerably, and the string length might exceed 4,000 byte-pairs.
char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don't use up all that space. varchar and nvarchar are variable-length which will only use up spaces for the characters you store.
The real reason you want to use NVARCHAR is when you have different languages in the same column, you need to address the columns in T-SQL without decoding, you want to be able to see the data "natively" in SSMS, or you want to standardize on Unicode.
If you treat the database as dumb storage, it is perfectly possible to store wide strings and different (even variable-length) encodings in VARCHAR (for instance UTF-8). The problem comes when you are attempting to encode and decode, especially if the code page is different for different rows. It also means that the SQL Server will not be able to deal with the data easily for purposes of querying within T-SQL on (potentially variably) encoded columns.
Using NVARCHAR avoids all this.
I would recommend NVARCHAR for any column which will have user-entered data in it which is relatively unconstrained.
I would recommend VARCHAR for any column which is a natural key (like a vehicle license plate, SSN, serial number, service tag, order number, airport callsign, etc) which is typically defined and constrained by a standard or legislation or convention. Also VARCHAR for user-entered, and very constrained (like a phone number) or a code (ACTIVE/CLOSED, Y/N, M/F, M/S/D/W, etc). There is absolutely no reason to use NVARCHAR for those.
So for a simple rule:
VARCHAR when guaranteed to be constrained NVARCHAR otherwise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With