I have the following two fields in a Sql Server table: <img src="https://i.stack.imgur.com/Zp1ZC.png" alt=""> When I add some test data with accented characters into the field, it actually stores them! I thought I had to change the column from <code>VARCHAR</code> to <code>NVARCHAR</code> to accept accented characters, etc? <img src="https://i.stack.imgur.com/lZok9.png" alt=""> Basically, I thought: <ul> <li> <code>VARCHAR</code> = ASCII</li> <li> <code>NVARCHAR</code> = Unicode</li> </ul> So is this a case where <code>façade</code> etc are actually ASCII .. while some other characters would error (if <code>VARCHAR</code>)? I can see the <code>ç</code> and <code>é</code> characters in the extended ASCII chart (link above) .. so does this mean ASCII includes 0->127 or 0->255? (Side thought: I guess I'm happy with accepting 0->255 and stripping out anything else.) <h3>Edit</h3> <ul> <li>DB collation: <code>Latin1_General_CI_AS</code> </li> <li>Server Version: <code>12.0.5223.6</code> </li> <li>Server Collation: <code>SQL_Latin1_General_CP1_CI_AS</code> </li> </ul>

First the details of what Sql Server is doing. <code>VARCHAR</code> stores single-byte characters using a specific collation. ASCII only uses 7 bits, or half of the possible values in a byte. A collation references a specific code page (along with sorting and equating rules) to use the other half of the possible values in each byte. These code pages often include support for a limited and specific set of accented characters. If the code page used for your data supports an accent character, you can do it; if it doesn't, you see weird results (unprintable "box" or ? characters). You can even output data stored in one collation as if it had been stored in another, and get really weird stuff that way (but don't do this). <code>NVARCHAR</code> is unicode, but there is still some reliance on collations. In most situations, you will end up with UTF-16, which does allow for the full range of unicode characters. Certain collations will result instead in UCS-2, which is slightly more limited. See the nchar/nvarchar documentation for more information. As an additional quirk, the upcoming Sql Server 2019 will include support for UTF-8 in <code>char</code> and <code>varchar</code> types when using the correct collation. <hr> Now to answer the question. In some rare cases, where you are sure your data only needs to support accent characters originating from a single specific (usually local) culture, and only those specific accent characters, you can get by with the <code>varchar</code> type. But be very careful making this determination. In an increasingly global and diverse world, where even small businesses want to take advantage of the internet to increase their reach, even within their own community, using an insufficient encoding can easily result in bugs and even security vulnerabilities. The majority of situations where it seems like a <code>varchar</code> encoding might be good enough are really not safe anymore. Personally, about the only place I use <code>varchar</code> today is mnemonic code strings that are never shown to or provided by an end user; things that might be <code>enum</code> values in procedural code. Even then, this tends to be legacy code, and given the option I'll use integer values instead, for faster joins and more efficient memory use. However, the upcoming UTF-8 support may change this.

Should NVARCHAR be used to saved 'accented characters' into Sql Server?

Edit

DB collation: Latin1_General_CI_AS
Server Version: 12.0.5223.6
Server Collation: SQL_Latin1_General_CP1_CI_AS

823

asked Sep 05 '19 06:09

Pure.Krome

1 Answers

First the details of what Sql Server is doing.

VARCHAR stores single-byte characters using a specific collation. ASCII only uses 7 bits, or half of the possible values in a byte. A collation references a specific code page (along with sorting and equating rules) to use the other half of the possible values in each byte. These code pages often include support for a limited and specific set of accented characters. If the code page used for your data supports an accent character, you can do it; if it doesn't, you see weird results (unprintable "box" or ? characters). You can even output data stored in one collation as if it had been stored in another, and get really weird stuff that way (but don't do this).

NVARCHAR is unicode, but there is still some reliance on collations. In most situations, you will end up with UTF-16, which does allow for the full range of unicode characters. Certain collations will result instead in UCS-2, which is slightly more limited. See the nchar/nvarchar documentation for more information.

As an additional quirk, the upcoming Sql Server 2019 will include support for UTF-8 in char and varchar types when using the correct collation.

Now to answer the question.

In some rare cases, where you are sure your data only needs to support accent characters originating from a single specific (usually local) culture, and only those specific accent characters, you can get by with the varchar type.

But be very careful making this determination. In an increasingly global and diverse world, where even small businesses want to take advantage of the internet to increase their reach, even within their own community, using an insufficient encoding can easily result in bugs and even security vulnerabilities. The majority of situations where it seems like a varchar encoding might be good enough are really not safe anymore.

Personally, about the only place I use varchar today is mnemonic code strings that are never shown to or provided by an end user; things that might be enum values in procedural code. Even then, this tends to be legacy code, and given the option I'll use integer values instead, for faster joins and more efficient memory use. However, the upcoming UTF-8 support may change this.

answered Sep 18 '22 10:09

Joel Coehoorn

Related questions
                            
                                Run the same query against multiple tables without dynamic sql
                            
                                Database-first EF7-beta7 dnx ef dbcontext scaffold command fails
                            
                                T-SQL Case statement strange behavior with newid() as randomness source
                            
                                SQL locks parent table while deleting child table row
                            
                                SQL reporting invalid syntax when run in Power BI
                            
                                Select all hierarchy level and below SQL Server
                            
                                Inserting Multiple Records into SQL Server database using for loop
                            
                                Odd SQL Server (TSQL) query results with NEWID() in the "WHERE" clause
                            
                                Incorrect syntax near the keyword 'table' and could not extract ResultSet
                            
                                SQL Server Binary(16) in c#
                            
                                Which datatype can be used for storing HTML files in database?
                            
                                generate DDL script from SQL Server database
                            
                                SQL Server Select query with IN() and order by the same
                            
                                Operand data type NULL is invalid for max operator
                            
                                How do I keep FOR JSON PATH from escaping query results?
                            
                                Wrong result comparing GETDATE() to stored GETDATE() in SQL Server
                            
                                How to combine multiple columns into one column?
                            
                                How to give MSSQL$SQL2016 permissions to write to a folder
                            
                                Node Sequelize (MSSQL) - Login failed for user ''
                            
                                How to show day name in SQL Server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should NVARCHAR be used to saved 'accented characters' into Sql Server?

Tags:

sql-server

unicode

Edit

Pure.Krome

People also ask

1 Answers

Joel Coehoorn

Recent Activity

Donate For Us