How can I check if a String is encodable in some encoding?

Tags:

The following test fails on converted Latin1, because illegal characters are replaced with byte with the value 63 (question mark). The problem is that these characters should better cause some exception ...

  @Test
  public void testEncoding() throws UnsupportedEncodingException {
    final String czech = "Řízeček a šampáňo a žízeň";
    // okay
    final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
    // different bytes, but okay
    final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
    // different bytes, but okay
    final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
    // nonsense; Ř,č,... are not in Latin1 code set!!!
    final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");

    System.out.println(Arrays.toString(bytesInLatin2));
    System.out.println(Arrays.toString(bytesInWin1250));
    System.out.println(Arrays.toString(bytesInUtf8));
    System.out.println(Arrays.toString(bytesInLatin1));
    System.out.flush();

    final String latin2 = new String(bytesInLatin2, "ISO8859-2");
    final String win1250 = new String(bytesInWin1250, "Windows-1250");
    final String utf8 = new String(bytesInUtf8, "UTF-8");
    final String latin1 = new String(bytesInLatin1, "ISO8859-1");

    Assert.assertEquals("latin2", czech, latin2);
    Assert.assertEquals("win1250", czech, win1250);
    Assert.assertEquals("utf8", czech, utf8);
    Assert.assertEquals("latin1", czech, latin1); // this test will fail!
  }

There are many situations where the data are finally corrupted because of this behaviour of Java. Is there any library available to validate Strings if they are encodable with some encoding?

872

asked Jun 03 '13 17:06

dmatej

2 Answers

I suspect you're looking for CharsetEncoder.canEncode(CharSequence).

Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...

answered Oct 29 '22 13:10

Jon Skeet

As an alternative to Jon Skeet's suggestion, you can also use CharsetEncoder class to do the encoding directly (with the encode method), but first call the onMalformedInput and onUnmappableCharacter methods to specify what the encoder should do when it encounters bad input.

That way most of the time you're just doing a simple encode call, but if anything goes wrong you'll get an exception.

answered Oct 29 '22 12:10

James Holderness

Related questions
                            
                                NumberFormat.parse() fails for some currency strings
                            
                                Can we switch between ASCII and Unicode
                            
                                Duplicate entry for key 'PRIMARY' using JPA to persist into database
                            
                                Calling Oracle stored procedures with MyBatis
                            
                                EclipseLink JPA Tracking Changes
                            
                                Method onHandleIntent() does not get called
                            
                                what is the meaning of that "we change behavior of any object at runtime in java"
                            
                                Apache POI setPrintArea to A4 page size
                            
                                Is any reason to put @NotNull annotation on @Id field in Java entity class
                            
                                How to "copy" an array operation to another array?
                            
                                How to execute full text search command in MongoDB with Java Driver ?
                            
                                TestNG Groups: Can we include two group names and create one group to run tests?
                            
                                Switch case for two INT variables
                            
                                Which one is better in calling a function: two times or storing the result in a variable?
                            
                                java.sql.SQLException: Data truncated for column
                            
                                ElementNSImpl to String
                            
                                Unexpected result while dividing int by int and storing result into double [duplicate]
                            
                                How the byte data type can be useful for saving memory in large arrays
                            
                                How do I change the destination directory of Ant's fileset command?
                            
                                Writing image into pdf file in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I check if a String is encodable in some encoding?

Tags:

java

string

character-encoding

encoding

dmatej

People also ask

2 Answers

Jon Skeet

James Holderness

Recent Activity

Donate For Us