I have input data coming from a flat file which has english, japanese, chinese characters in one column. I am loading these values in a staging table column whose schema definition is VARCHAR2(250 CHAR), the main table column has definition VARCHAR2(250) WHICH i can not change. So, i am doing a SUBSTR on this column. After loading the table when i did a
SELECT * FROM TABLE
...I get this error :
ORA-29275: partial multibyte character
If i select other columns then no issues.
you should use SUBSTRB
when you copy your data from your 250 CHAR
column to your 250 byte
column. This function will only output whole characters (you won't get incomplete unicode characters):
SQL> select substrb('中华人', 1, 9) ch9,
2 substrb('中华人', 1, 8) ch8,
3 substrb('中华人', 1, 7) ch7,
4 substrb('中华人', 1, 6) ch6,
5 substrb('中华人', 1, 5) ch5
6 FROM dual;
CH9 CH8 CH7 CH6 CH5
--------- -------- ------- ------ -----
中华人 中华 中华 中华 中
@mwardm made an interesting comment concerning the actual length of the resulting string and whether the resulting string could contain an invalid sequence of bytes. Consider the following on an AL32UTF8 DB:
SQL> select lengthb('ÏÏÏ'),
2 lengthb(substrb('ÏÏÏÏÏÏ', 1, 5)),
3 dump('ÏÏÏ'),
4 dump(substrb('ÏÏÏÏÏÏ', 1, 5))
5 FROM dual;
LE LE DUMP('ÏÏÏ') DUMP(SUBSTRB('ÏÏÏÏÏÏ',1,5))
-- -- ------------------------------------- -------------------------------
6 5 Typ=96 Len=6: 195,143,195,143,195,143 Typ=1 Len=5: 195,143,195,143,32
As you can see the last byte of the substrb
string is not the truncated first byte of the special character but encodes a legit character (The first 128 characters in this character set are the same as the ASCII7US character set so this encodes the ' '
space character, using RTRIM as suggested in another answer will remove the last character).
Furthermore, I also got this interesting result using the character set AL16UTF16:
SQL> select lengthb(N'ĈĈ') le,
2 dump(N'ĈĈ') dump,
3 lengthb(substrb(N'Ĉ', 1, 3)) length_substr,
4 dump(substrb(N'ĈĈ', 1, 3)) dump_substr
5 from dual;
LE DUMP LENGTH_SUBSTR DUMP_SUBSTR
---------- ----------------------- ------------- -----------------
4 Typ=96 Len=4: 1,8,1,8 2 Typ=1 Len=2: 1,8
In this case Oracle has choosen to cut the string after the second byte because in the AL16UTF16 character set there is no legit one-byte character. The resulting string is only 2 bytes instead of 3.
This would need further testing and is by no mean a rigorous demonstration but I still stand by my first hunch that substrb
will return a valid sequence of bytes that encodes a valid string of characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With