Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

People also ask

How do I convert UTF-8 to ISO-8859-1?

Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found. byte[] utf8 = ... byte[] latin1 = new String(utf8, "UTF-8"). getBytes("ISO-8859-1"); You can exercise more control by using the lower-level Charset APIs.

What is the difference between UTF-8 and ISO-8859-1?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

Is UTF-8 a byte?

UTF-8 uses one byte to represent code points from 0-127. These first 128 Unicode code points correspond one-to-one with ASCII character mappings, so ASCII characters are also valid UTF-8 characters.

How do I change my UTF-8 character set?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.

If you're dealing with character encodings other than UTF-16, you shouldn't be using java.lang.String or the char primitive -- you should only be using byte[] arrays or ByteBuffer objects. Then, you can use java.nio.charset.Charset to convert between encodings:

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");

ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2});

// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);

// encode ISO-8559-1
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();

byte[] iso88591Data = theString.getBytes("ISO-8859-1");

Will do the trick. From your description it seems as if you're trying to "store an ISO-8859-1 String". String objects in Java are always implicitly encoded in UTF-16. There's no way to change that encoding.

What you can do, 'though is to get the bytes that constitute some other encoding of it (using the .getBytes() method as shown above).

Starting with a set of bytes which encode a string using UTF-8, creates a string from that data, then get some bytes encoding the string in a different encoding:

    byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 };
    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    String string = new String ( utf8bytes, utf8charset );

    System.out.println(string);

    // "When I do a getbytes(encoding) and "
    byte[] iso88591bytes = string.getBytes(iso88591charset);

    for ( byte b : iso88591bytes )
        System.out.printf("%02x ", b);

    System.out.println();

    // "then create a new string with the bytes in ISO-8859-1 encoding"
    String string2 = new String ( iso88591bytes, iso88591charset );

    // "I get a two different chars"
    System.out.println(string2);

this outputs strings and the iso88591 bytes correctly:

âabcd 
e2 61 62 63 64 
âabcd

So your byte array wasn't paired with the correct encoding:

    String failString = new String ( utf8bytes, iso88591charset );

    System.out.println(failString);

Outputs

Ã¢abcd

(either that, or you just wrote the utf8 bytes to a file and read them elsewhere as iso88591)

This is what I needed:

public static byte[] encode(byte[] arr, String fromCharsetName) {
    return encode(arr, Charset.forName(fromCharsetName), Charset.forName("UTF-8"));
}

public static byte[] encode(byte[] arr, String fromCharsetName, String targetCharsetName) {
    return encode(arr, Charset.forName(fromCharsetName), Charset.forName(targetCharsetName));
}

public static byte[] encode(byte[] arr, Charset sourceCharset, Charset targetCharset) {

    ByteBuffer inputBuffer = ByteBuffer.wrap( arr );

    CharBuffer data = sourceCharset.decode(inputBuffer);

    ByteBuffer outputBuffer = targetCharset.encode(data);
    byte[] outputData = outputBuffer.array();

    return outputData;
}

Related questions
                            
                                Mockito: Verifying with generic parameters
                            
                                How to add a linked source folder in Android Studio?
                            
                                Instantiating generics type in java
                            
                                What is the Maven way for automatic project versions when doing continuous delivery?
                            
                                Recommendations for java captcha libraries [closed]
                            
                                Android: why setVisibility(View.GONE); or setVisibility(View.INVISIBLE); do not work
                            
                                How to find the minimum value in an ArrayList, along with the index number? (Java)
                            
                                Java/ImageIO getting image dimensions without reading the entire file?
                            
                                How to get full REST request body using Jersey?
                            
                                What is the most efficient way to convert an int to a String?
                            
                                How to stop a thread created by implementing runnable interface?
                            
                                How to pass system properties to a jar file
                            
                                How do I extract a tar file in Java?
                            
                                If you overwrite a field in a subclass of a class, the subclass has two fields with the same name(and different type)?
                            
                                Unable to locate Source XRef to link to
                            
                                Exception Vs Assertion
                            
                                How to turn off hbm2ddl?
                            
                                Usages of doThrow() doAnswer() doNothing() and doReturn() in mockito
                            
                                Android proguard, keep inner class
                            
                                Spring Data JPA Update @Query not updating?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

Tags:

java

utf-8

iso-8859-1

People also ask

Recent Activity

Donate For Us