Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do substring for UTF8 string in java?

Suppose I have the following string: Rückruf ins Ausland I need to insert it into the database which has a max size of 10. I did a normal substring in java and it extracted this string Rückruf in which is 10 characters. When it tries to insert this column I get the following oracle error:

java.sql.SQLException: ORA-12899: value too large for column "WAEL"."TESTTBL"."DESC" (actual: 11, maximum: 10) The reason for this is that the database has a AL32UTF8 character set thus the ü will take 2 chars.

I need to write a function in java that does this substring but taking into consideration that the ü takes 2 bytes so the returned substring in this case should be Rückruf i (9 chars). Any suggestions?

like image 331
Wael Avatar asked Jul 16 '15 13:07

Wael


People also ask

How do you substring a string in Java?

The substring begins with the character at the specified index and extends to the end of this string or up to endIndex – 1 if the second argument is given. Syntax : public String substring(int begIndex, int endIndex) Parameters : beginIndex : the begin index, inclusive. endIndex : the end index, exclusive.

How do I convert a string to UTF-8 in Java?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

Is Java a UTF-8 string?

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.


1 Answers

If you want to trim the data in Java you must write a function that trims the string using the db charset used, something like this test case:

package test;

import java.io.UnsupportedEncodingException;

public class TrimField {

    public static void main(String[] args) {
        //UTF-8 is the db charset
        System.out.println(trim("Rückruf ins Ausland",10,"UTF-8"));
        System.out.println(trim("Rüückruf ins Ausland",10,"UTF-8"));
    }

    public static String trim(String value, int numBytes, String charset) {
        do {
            byte[] valueInBytes = null;
            try {
                valueInBytes = value.getBytes(charset);
            } catch (UnsupportedEncodingException e) {
                throw new RuntimeException(e.getMessage(), e);
            }
            if (valueInBytes.length > numBytes) {
                value = value.substring(0, value.length() - 1);
            } else {
                return value;
            }
        } while (value.length() > 0);
        return "";

    }

}
like image 127
Giovanni Avatar answered Nov 01 '22 10:11

Giovanni