As well as <code>CHAR</code> <code>(CHARACTER)</code> and <code>VARCHAR</code> <code>(CHARACTER VARYING)</code>, SQL offers an <code>NCHAR</code> <code>(NATIONAL CHARACTER)</code> and <code>NVARCHAR</code> <code>(NATIONAL CHARACTER VARYING)</code> type. In some databases, this is the better datatype to use for character (non-binary) strings: <ul> <li>In SQL Server, <code>NCHAR</code> is stored as UTF-16LE and is the only way to reliably store non-ASCII characters, <code>CHAR</code> being a single-byte codepage only;</li> <li>In Oracle, <code>NVARCHAR</code> may be stored as UTF-16 or UTF-8 rather than a single-byte collation;</li> <li>But in MySQL, <code>NVARCHAR</code> is <code>VARCHAR</code>, so it makes no difference, either type can be stored with UTF-8 or any other collation.</li> </ul> So, what does <code>NATIONAL</code> actually conceptually mean, if anything? The vendors' docs only tell you about what character sets their own DBMSs use, rather than the actual rationale. Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that <code>NATIONAL CHARACTER</code> is stored in an implementation-defined character set. As opposed to a mere <code>CHARACTER</code>, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not. Thanks, ANSI. Thansi. Should one use <code>NVARCHAR</code> for all character (non-binary) storage purposes? Are there currently-popular DBMSs in which it will do something undesirable, or which just don't recognise the keyword (or <code>N''</code> literals)?

<blockquote> Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not. </blockquote> Coincidentally, this is the same "distinction" the C++ standard makes between <code>char</code> and <code>wchar_t</code>. A relic of the Dark Ages of Character Encoding when every language/OS combination has its own character set. <blockquote> Should one use NVARCHAR for all character (non-binary) storage purposes? </blockquote> It is not important whether the declared type of your column is <code>VARCHAR</code> or <code>NVARCHAR</code>. But it is important to use Unicode (whether UTF-8, UTF-16, or UTF-32) for all character storage purposes. <blockquote> Are there currently-popular DBMSs in which it will do something undesirable </blockquote> Yes: In MS SQL Server, using <code>NCHAR</code> makes your (English) data take up twice as much space. <s>Unfortunately, UTF-8 isn't supported yet.</s> EDIT: SQL Server 2019 finally introduced UTF-8 support.

What's the SQL national character (NCHAR) datatype really for?

Tags:

sql

sql-server

tsql

nvarchar

oracle

As well as CHAR (CHARACTER) and VARCHAR (CHARACTER VARYING), SQL offers an NCHAR (NATIONAL CHARACTER) and NVARCHAR (NATIONAL CHARACTER VARYING) type. In some databases, this is the better datatype to use for character (non-binary) strings:

In SQL Server, NCHAR is stored as UTF-16LE and is the only way to reliably store non-ASCII characters, CHAR being a single-byte codepage only;
In Oracle, NVARCHAR may be stored as UTF-16 or UTF-8 rather than a single-byte collation;
But in MySQL, NVARCHAR is VARCHAR, so it makes no difference, either type can be stored with UTF-8 or any other collation.

So, what does NATIONAL actually conceptually mean, if anything? The vendors' docs only tell you about what character sets their own DBMSs use, rather than the actual rationale. Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.

Thanks, ANSI. Thansi.

Should one use NVARCHAR for all character (non-binary) storage purposes? Are there currently-popular DBMSs in which it will do something undesirable, or which just don't recognise the keyword (or N'' literals)?

874

asked Oct 09 '10 02:10

bobince

3 Answers

"NATIONAL" in this case means characters specific to different nationalities. Far east languages especially have so many characters that one byte is not enough space to distinguish them all. So if you have an english(ascii)-only app or an english-only field, you can get away using the older CHAR and VARCHAR types, which only allow one byte per character.

That said, most of the time you should use NCHAR/NVARCHAR. Even if you don't think you need to support (or potentially support) multiple languages in your data, even english-only apps need to be able to sensibly handle security attacks using foreign-language characters.

In my opinion, about the only place where the older CHAR/VARCHAR types are still preferred is for frequently-referenced ascii-only internal codes and data on platforms like Sql Server that support the distinction — data that would be the equivalent of an enum in a client language like C++ or C#.

answered Oct 18 '22 19:10

Joel Coehoorn

Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.

Coincidentally, this is the same "distinction" the C++ standard makes between char and wchar_t. A relic of the Dark Ages of Character Encoding when every language/OS combination has its own character set.

Should one use NVARCHAR for all character (non-binary) storage purposes?

It is not important whether the declared type of your column is VARCHAR or NVARCHAR. But it is important to use Unicode (whether UTF-8, UTF-16, or UTF-32) for all character storage purposes.

Are there currently-popular DBMSs in which it will do something undesirable

Yes: In MS SQL Server, using NCHAR makes your (English) data take up twice as much space. ~~Unfortunately, UTF-8 isn't supported yet.~~

EDIT: SQL Server 2019 finally introduced UTF-8 support.

answered Oct 18 '22 19:10

dan04

In Oracle, the database character set can be a multi-byte character set, so you can store all manner of characters in there....but you need to understand and define the length of the columns appropriately (in either BYTES or CHARACTERS).

NVARCHAR gives you the option to have a database character set that is a single-byte (which reduces the potential for confusion between BYTE or CHARACTER sized columns) and use NVARCHAR as the multi-byte. See here.

Since I predominantly work with English data, I'd go with a multi-byte character set (UTF-8 mostly) as the database character set and ignore NVARCHAR. If I inherited an old database which was in a single-byte characterset and was too big to convert, I may use NVARCHAR. But I'd prefer not to.

answered Oct 18 '22 18:10

Gary Myers

Related questions
                            
                                MySQL 1062 - Duplicate entry '0' for key 'PRIMARY'
                            
                                Copy data from one existing row to another existing row in SQL?
                            
                                How can I import a SQL file into a Rails database?
                            
                                Change Primary Key
                            
                                Best way to understand complex SQL statements?
                            
                                linq case statement
                            
                                How can I get a plain text postgres database dump on heroku?
                            
                                List names of all tables in a SQL Server 2012 schema
                            
                                Know relationships between all the tables of database in SQL Server
                            
                                PHP: maximum execution time when importing .SQL data file
                            
                                What is the difference between a candidate key and a primary key?
                            
                                The tail of the log for the database "DBName" has not been backed up
                            
                                How can I combine multiple rows into a comma-delimited list in SQL Server 2005?
                            
                                Rails: Get next / previous record
                            
                                Cast syntax to convert a sum to float
                            
                                Aggregating save()s in Django?
                            
                                What is the purpose of PAD_INDEX in this SQL Server constraint?
                            
                                MySQL combine two columns and add into a new column
                            
                                Get the last day of the month in SQL
                            
                                How can I determine the status of a job?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With