I have the following two fields in a Sql Server table:
When I add some test data with accented characters into the field, it actually stores them! I thought I had to change the column from VARCHAR
to NVARCHAR
to accept accented characters, etc?
Basically, I thought:
VARCHAR
= ASCIINVARCHAR
= UnicodeSo is this a case where façade
etc are actually ASCII .. while some other characters would error (if VARCHAR
)?
I can see the ç
and é
characters in the extended ASCII chart (link above) .. so does this mean ASCII includes 0->127 or 0->255?
(Side thought: I guess I'm happy with accepting 0->255 and stripping out anything else.)
Latin1_General_CI_AS
12.0.5223.6
SQL_Latin1_General_CP1_CI_AS
Use nvarchar when the sizes of the column data entries vary considerably. Use nvarchar(max) when the sizes of the column data entries vary considerably, and the string length might exceed 4,000 byte-pairs.
As the name suggests, varchar means character data that is varying. Also known as Variable Character, it is an indeterminate length string data type. It can hold numbers, letters and special characters.
Today's development platforms or their operating systems support the Unicode character set. Therefore, In SQL Server, you should utilize NVARCHAR rather than VARCHAR. If you do use VARCHAR when Unicode support is present, then an encoding inconsistency will arise while communicating with the database.
Use NVARCHAR instead of VARCHAR. SQL Server provides both datatypes to store character information. For the most part the two datatypes are identical in how you would work with them within SQL Server or from an application.
First the details of what Sql Server is doing.
VARCHAR
stores single-byte characters using a specific collation. ASCII only uses 7 bits, or half of the possible values in a byte. A collation references a specific code page (along with sorting and equating rules) to use the other half of the possible values in each byte. These code pages often include support for a limited and specific set of accented characters. If the code page used for your data supports an accent character, you can do it; if it doesn't, you see weird results (unprintable "box" or ? characters). You can even output data stored in one collation as if it had been stored in another, and get really weird stuff that way (but don't do this).
NVARCHAR
is unicode, but there is still some reliance on collations. In most situations, you will end up with UTF-16, which does allow for the full range of unicode characters. Certain collations will result instead in UCS-2, which is slightly more limited. See the nchar/nvarchar documentation for more information.
As an additional quirk, the upcoming Sql Server 2019 will include support for UTF-8 in char
and varchar
types when using the correct collation.
Now to answer the question.
In some rare cases, where you are sure your data only needs to support accent characters originating from a single specific (usually local) culture, and only those specific accent characters, you can get by with the varchar
type.
But be very careful making this determination. In an increasingly global and diverse world, where even small businesses want to take advantage of the internet to increase their reach, even within their own community, using an insufficient encoding can easily result in bugs and even security vulnerabilities. The majority of situations where it seems like a varchar
encoding might be good enough are really not safe anymore.
Personally, about the only place I use varchar
today is mnemonic code strings that are never shown to or provided by an end user; things that might be enum
values in procedural code. Even then, this tends to be legacy code, and given the option I'll use integer values instead, for faster joins and more efficient memory use. However, the upcoming UTF-8 support may change this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With