Some columns of my tables need to support Unicode characters (Let's say 1% of all my columns).
I think I have the following two options:
I'm inclined to the second option (in order to not have to change my already existing VARCHAR2 scripts).
My question is: what are the disadvantages and advantages of this second option, when compared with the first one? Is it less performant?
I would strongly lean toward changing the character set of the database.
There are potential downsides of doing so
VARCHAR2
. By default, VARCHAR2(50)
allocates 50 bytes of storage which will allow you to store between 16 and 50 characters if you're using an AL32UTF8 character set rather than being a simple 1:1 mapping like it is if you're using a single-byte character set. That will require either that you increase the size of your columns (i.e. triple them) to ensure that they store the appropriate number of characters or that you specify character-length semantics when you declare the column (i.e. VARCHAR2(50 CHAR)
) or that you set your NLS_LENGTH_SEMANTICS
to CHAR
before creating the objects in order to change the default to character-length semantics. There is a discussion on the Oracle Globalization forum about whether it is appropriate to change the NLS_LENGTH_SEMANTICS at the instance level-- Sergiusz Wolicki, one of Oracle's top globalization gurus suggests strongly against it though I am personally much more willing to consider it under the right circumstances. You can also set the NLS_LENGTH_SEMANTICS
at the session level which is something Sergiusz does not object to but does require that you do so every time you run a script which may be a concern.CHAR_LENGTH
and DATA_LENGTH
columns where they want the length in characters vs. the length in bytes. This may be a minor issue for you or a serious pain if you have existing tools/ scripts/ etc. that run queries against the data dictionary to generate DDL or to determine how much memory needs to be allocated or some other situation where you end up getting funky results.However, those downsides are more than outweighed by the advantages of having a single character set for all your data
NVARCHAR2
columns generally requires application code changes. Since you're going to have both VARCHAR2
and NVARCHAR2
columns, those code changes and configuration settings can be non-trivial and are often a major annoyance. Inevitably, you'll find that you mapped a particular column incorrectly in some application and you'll encounter data corruption bugs that are a pain to track down. This is more true the more layers of abstraction you have between your database and your application.VARCHAR2
to NVARCHAR2
is a pain-- you'll need to add a new column, copy the data over, and remove the old column, rename the new column, and deal with the row migration that resulted. You'll then have to make changes to all your existing applications so that they map the column correctly. When the business decides that one more column needs to support additional languages and your database and applications already support Unicode, that level of effort and testing will seem rather excessive.NVARCHAR2
column as a literal in a SQL statement either in an application (say, to avoid bind variable peeking or to better leverage a histogram) or as part of production support when you want to track down issues in the data.NVARCHAR2
columns is strongly discouraged. This probably doesn't have immediate practical consequences but if your system is supposed to be around for a number of years, it's likely that there will be consequences in the future. Sergiusz sums up Oracle's advice very well in this thread
Oracle's advice:
- For any new database, create it with the AL32UTF8 character set and forget about NCHAR data types.
- For any existing application to be made multilingual, migrate the backend database to AL32UTF8 and forget about NCHAR data types.
- For any existing non-Unicode database serving a large legacy application system that is too costly or impossible to migrate to Unicode, to which you are asked to add a minor module that has to support multilingual data and for which a separate database makes little sense, you may consider NVARCHAR2 columns for this multilingual data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With