Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Java UTF-8 Charset exception possible?

Can Java possibly throw UnsupportedEncodingException when using "UTF-8" encoding, or I can safely suppress it's throwing?

like image 208
Alex Abdugafarov Avatar asked Feb 19 '11 07:02

Alex Abdugafarov


People also ask

Is Java UTF-8 or 16?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is Java UTF-8 encoding?

UTF-8 is a variable width character encoding. UTF-8 has ability to be as condense as ASCII but can also contain any unicode characters with some increase in the size of the file. UTF stands for Unicode Transformation Format. The '8' signifies that it allocates 8-bit blocks to denote a character.

Are Java Strings UTF-8?

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.

What is the default charset for Java?

encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters.


1 Answers

As McDowell noted in a comment to templatetypdef's answer: If you use a Charset object when you instantiate a new String instead of passing the name of the charset, you don't have to deal with an UnsupportedEncodingException or any other checked exception:

byte[] bytes = ...;

// Requires you to handle UnsupportedEncodingException
String s1 = new String(bytes, "UTF-8");

// Doesn't require you to handle any checked exceptions
String s2 = new String(bytes, Charset.forName("UTF-8"));

It's an inconsistency in Java's standard library that we have to live with...

Note that Charset.forName(...) can throw exceptions (IllegalCharsetNameException, IllegalArgumentException, UnsupportedCharsetException), but these are all unchecked exceptions, so you don't have to catch or re-throw them yourself.

edit - Since Java 7 there's class java.nio.charset.StandardCharsets which has constants for frequently used character encodings. Example:

String s3 = new String(bytes, StandardCharsets.UTF_8);
like image 188
Jesper Avatar answered Oct 01 '22 03:10

Jesper