Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Firebird fields/domains from ISO8859_1 to UTF8

I have done some research via google but cant find a proper answer.

I have a Firebird Database and i always using own Domains for my Table fields. All of that domains are defined with Charset ISO8859_1. Now I want to change it to UTF8. If i try this in IBExpert it brings me this code:

ALTER DOMAIN D_CHAR100 TYPE VARCHAR(100) CHARACTER SET UTF8;

This update works. But does it really work? Are all characters converted correctly and do i now have changed my fields to "real" UTF8 ?? Or does it remain as ISO08859_1 internally?

If i search in the internet, some say:

  • solution via temp-field and coping of all data (a lot of work with big databases)

and other say:

  • changing of the domain or field-datatype is enough.

What is right? What could go wrong? We have a lot of customers and i want to convert the database by script.

like image 961
Andreas Avatar asked Oct 21 '22 11:10

Andreas


1 Answers

Altering the field does not alter any data inside that field. And it will expose many problems for you down the line. The best way to do this is to copy data, however you have more work to do than just that.

Here are some of the issues you'll run into:

  1. Any stored procedures/triggers that use this field must be updated to use newer variables.
  2. A varchar(100) field may take up to 100 bytes in ASCII but it will take up to 400 bytes in UTF. Thus your new UTF field can be a max of 8191 in size. So any varchar or char fields above this size cannot be converted.
  3. Even after you've converted a varchar(100) field from ASCII to UTF you can still break selection statements because Firebird has a 64KB limit on rows. And you're quadrupling the data size on these fields.
  4. If you have any characters above 127 ASCII value the resulting column will not be selectable. Characters like this would include the one-half character: ½. Its value is 171 and breaks when told to be UTF8.

Try out these two statements:

select cast('½' as varchar(10) character set ISO8859_1)
from rdb$database

select cast('½' as varchar(10) character set UTF8)
from rdb$database

The first one works, and the second one does not.

In the end simply altering the field will expose the four issues above however you won't know they exist until you come across them, which in a complicated database may not be until a production level user runs into them. Meanwhile copying the data will yield more work for yourself but will enable you to correctly handle all the items above.

Two more things to note:

  1. If you copy the data you'll get an error for #4. Rather you should scrub this data as needed and do this with an external application that can convert these values correctly. ASCII 171 = UTF 189 = 1/2 character.

  2. Any application code that executes statements against these fields can still violate the 64KB rule for issue #3. You need to search all larger fields or statements at a minimum to ensure you are not hitting this.

like image 72
Paul Avatar answered Oct 23 '22 23:10

Paul