Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Strings to and from UTF8 byte arrays in Java

People also ask

Can we convert string to byte array in Java?

A String is stored as an array of Unicode characters in Java. To convert it to a byte array, we translate the sequence of characters into a sequence of bytes. For this translation, we use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes.

Which method is used to convert a string to an array of bytes?

String class has getBytes() method which can be used to convert String to byte array in Java. getBytes()- Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

Which method converts string to byte value syntax?

The simplest way to do so is using parseByte() method of Byte class in java.


Convert from String to byte[]:

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);

Convert from byte[] to String:

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, the two most common encodings.


Here's a solution that avoids performing the Charset lookup for every conversion:

import java.nio.charset.Charset;

private final Charset UTF8_CHARSET = Charset.forName("UTF-8");

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

byte[] encodeUTF8(String string) {
    return string.getBytes(UTF8_CHARSET);
}

String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");

You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.

90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.