Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VARCHAR max length is 4000 but only 2666 byte long Thai text can be stored

I'm working with Oracle database 11g with PL/SQL. program is written in C#, .NET 4.

I have a table with a varchar(4000) column, and I want to store in DB a text in Thai language. I found that when I try to store text in Thai that its length is longer than 1333 Oracle throws an exception, although the maximum length is 4000:

ORA-01461: can bind a LONG value only for insert into a LONG column

I also noticed that when I work with smaller limitation it works as expected. i.e.: varchar(10) - allows values in the length of 10 and longer text throws the exception:

ORA-01401 inserted value too large for column

From the tests I already made I can assume that from a number 'X' oracle stop counting the actual letter's length. Instead it multiplies the number of letters in the longest letter length value (in Thai the longest letter's length is 3 - meaning it is like 3 chars) and so 1333 * 3 = 3999 and I can only add another one character (in a language like English).

My questions:

  1. Why is there a difference in the behavior for a different maximum length?
  2. Can I know the real limitation of the text before I try to insert it to the table? (assume that text is in Thai and English and might be also in different languages as well)

Important note: Because I need to support a very big and functioning system I can't switch the data type to club but it would solve the problem.

Thanks for your help.


EDIT

I count the number of bytes of those 1333 characters. their byte length is 2666, which is still, less than 4000. I understand that the max length is in bytes but I don't understand why do I get the exception above. I used: System.Text.ASCIIEncoding.Unicode.GetByteCount(text) to check the number of bytes.

like image 850
hadaco Avatar asked Nov 18 '25 16:11

hadaco


1 Answers

The problem

When describing a VARCHAR, you should supply a unit, e.g. VARCHAR2(200 BYTE) or VARCHAR2(200 CHAR). If you omit the unit, the default is BYTE (see Oracle Database Concepts, chapter Oracle Datatypes). This seems to be like a minor detail, but becomes quite severe, when you have multi byte character sets.

Situation up to 11g

Unfortunately there is a hard limit on the maximum size of a VARCHAR2 column. It is 4000 BYTEs (!) (see Oracle Database Reference, chapter Oracle Datatypes) up to Oracle 11g and . This is a hard limit and there is no way around this. The only way around this is a CLOB column.

Solution for 12c

The situation is different on Oracle 12c. There you can use the parameter MAX_STRING_SIZE = EXTENDED to lift the limit up to 32767 BYTEs (see Oracle Database Language Reference, chapter Data Types and Oracle Database Reference, chapter Initialization Parameters). So the obvious solution is: Upgrade to Oracle 12c, set MAX_STRING_SIZE = EXTENDED according to the documentation and alter your table definition. You might loose some indexes when altering your table, because previously to 12c not indexes could not hold VARCHAR2 values with more than 4000 BYTEs and there might still be some limitation. (I have to check the problem with the indexes and if it can be fixed by rebuilding the indexes).

Solution: Change database encoding

You could try to change your native database encoding (the way your database maps CHARs to BYTEs). For this you usually have to create a new database and provide an appropriate parameter for NLS_CHARACTERSET. This is a very big change in how your database operates and might have several side effects. If you ever try to add characters in a different encoding, you might be out of luck (i.e. you cannot store them in your database). So I would not suggest this solution.

Solution: Switch to CLOB

Usually it is not necessary to provide arbitrary queries on such big text fields. You might try to identify the queries selecting on the big text column and migrate them to Oracle Text on a CLOB column. But this is a very big change and might not be possible with your existing schema or your application. You might end up with a bunch of "INSTEAD OF" triggers and with some missing constraint checks (involving the newly created CLOB column).

Solution: Use XML

Instead of a CLOB you could try to store your string as a XML column. Maximum size for these is 4GB. It will hurt your performance, you will have to provide INSTEAD OF triggers and you might lose some constraints, but it could work for you.

like image 126
stefan.schwetschke Avatar answered Nov 20 '25 07:11

stefan.schwetschke