Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save unicode data to oracle?

I am trying to save unicode data (greek) in oracle database (10 g). I have created a simple table:

alt text

I understand that NVARCHAR2 always uses UTF-16 encoding so it must be fine for all (human) languages.

Then I am trying to insert a string in database. I have hardcoded the string ("How are you?" in Greek) in code. Then I try to get it back from database and show it.

class Program
{
    static string connectionString = "<my connection string>";

    static void Main (string[] args) {
        string textBefore = "Τι κάνεις;";

        DeleteAll ();
        SaveToDatabase (textBefore);
        string textAfter = GetFromDatabase ();

        string beforeData = String.Format ("Before: {0}, ({1})", textBefore, ToHex (textBefore));
        string afterData = String.Format ("After: {0}, ({1})", textAfter, ToHex (textAfter));

        Console.WriteLine (beforeData);
        Console.WriteLine (afterData);

        MessageBox.Show (beforeData);
        MessageBox.Show (afterData);

        Console.ReadLine ();
    }

    static void DeleteAll () {
        using (var oraConnection = new OracleConnection (connectionString)) {
            oraConnection.Open ();
            var command = oraConnection.CreateCommand ();

            command.CommandText = "delete from UNICODEDATA";
            command.ExecuteNonQuery ();
        }            
    }

    static void SaveToDatabase (string stringToSave) {
        using (var oraConnection = new OracleConnection (connectionString)) {
            oraConnection.Open ();
            var command = oraConnection.CreateCommand ();

            command.CommandText = "INSERT into UNICODEDATA (ID, UNICODESTRING) Values (11, :UnicodeString)";
            command.Parameters.Add (":UnicodeString", stringToSave);
            command.ExecuteNonQuery ();
        }
    }

    static string GetFromDatabase () {
        using (var oraConnection = new OracleConnection (connectionString)) {
            oraConnection.Open ();

            var command = oraConnection.CreateCommand ();
            command.CommandText = "Select * from UNICODEDATA";
            var erpReader = command.ExecuteReader ();

            string s = String.Empty;
            while (erpReader.Read ()) {
                string text = erpReader.GetString (1);
                s += text + ", ";
            }

            return s;
        }
    }

    static string ToHex (string input) {
        string bytes = String.Empty;
        foreach (var c in input)
            bytes += ((int)c).ToString ("X4") + " ";

        return bytes;
    }
}

Here are different outputs:

Text before sending to database in a message box: alt text

Text after getting from database in a message box: alt text

Console Output: alt text

Please can you suggest what I might be doing wrong here?

like image 888
Hemant Avatar asked Jul 28 '09 06:07

Hemant


2 Answers

I can see five potential areas for problems:

  1. How are you actually getting the text into your .NET application? If it's hardcoded in a string literal, are you sure that the compiler is assuming the right encoding for your source file?

  2. There could be a problem in how you're sending it to the database.

  3. There could be a problem with how it's being stored in the database.

  4. There could be a problem with how you're fetching it in the database.

  5. There could be a problem with how you're displaying it again afterwards.

Now areas 2-4 sound like they're less likely to be an issue than 1 and 5. How are you displaying the text afterwards? Are you actually fetching it out of the database in .NET, or are you using Toad or something similar to try to see it?

If you're writing it out again from .NET, I suggest you skip the database entirely - if you just display the string itself, what do you see?

I have an article you might find useful on debugging Unicode problems. In particular, concentrate on every place where the encoding could be going wrong, and make sure that whenever you "display" a string you dump out the exact Unicode characters (as integers) so you can check those rather than just whatever your current font wants to display.

EDIT: Okay, so the database is involved somewhere in the problem.

I strongly suggest that you remove anything like ASP and HTML out of the equation. Write a simple console app that does nothing but insert the string and fetch it again. Make it dump the individual Unicode characters (as integers) before and after. Then try to see what's in the database (e.g. using Toad). I don't know the Oracle functions to convert strings into sequences of individual Unicode characters and then convert those characters into integers, but that would quite possibly be the next thing I'd try.

EDIT: Two more suggestions (good to see the console app, btw).

  1. Specify the data type for the parameter, instead of just giving it an object. For instance:

    command.Parameters.Add (":UnicodeString",
                            OracleType.NVarChar).Value = stringToSave;
    
  2. Consider using Oracle's own driver instead of the one built into .NET. You may wish to do this anyway, as it's generally reckoned to be faster and more reliable, I believe.

like image 175
Jon Skeet Avatar answered Sep 21 '22 02:09

Jon Skeet


You can determine what characterset your database uses for NCHAR with the query:

SQL> SELECT VALUE
  2    FROM nls_database_parameters
  3   WHERE parameter = 'NLS_NCHAR_CHARACTERSET';

VALUE
------------
AL16UTF16

to check if your database configuration is correct, you could run the following in SQL*Plus:

SQL> CREATE TABLE unicodedata (ID NUMBER, unicodestring NVARCHAR2(100)); 

Table created
SQL> INSERT INTO unicodedata VALUES (11, 'Τι κάνεις;');

1 row inserted
SQL> SELECT * FROM unicodedata;

        ID UNICODESTRING
---------- ---------------------------------------------------
        11 Τι κάνεις;
like image 39
Vincent Malgrat Avatar answered Sep 20 '22 02:09

Vincent Malgrat