Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is text in Swedish from a resource bundle showing up as gibberish? [duplicate]

Possible Duplicate:
How to use UTF-8 in resource properties with ResourceBundle

I want to allow internationalization to my Java Swing application. I use a bundle file to keep all labels inside it.

As a test I tried to set a Swedish title to a JButton. So in the bundle file I wrote:

nextStepButton=nästa

And in the Java code I wrote:

nextStepButton.setText(bundle.getString("nextStepButton"));

But the title characters of the button appear wrong at runtime:
alt text

I am using the Tahoma font, which supports Unicode. When I set the button title manually through code it appears fine:

nextStepButton.setText("nästa");

Any idea why it fails in bundle file ?

--------------------------------------------> Edit: Encoding the title:
I have tried encoding the text coming from the bundle file using the code:

nextStepButton.setText(new String(bundle.getString("nextStepButton").getBytes("UTF-8")));

And still the result is:
alt text

like image 513
Brad Avatar asked Dec 13 '10 12:12

Brad


3 Answers

As per the javadoc, properties files are read using ISO-8859-1.

.. the input/output stream is encoded in ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes ; only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.

Apart from using the native2ascii tool to convert UTF-8 properties files to ISO-8859-1 properties files, you can also use a custom ResourceBundle.Control so that you can control the loading of properties files and use UTF-8 there. Here's a kickoff example:

public class UTF8Control extends Control {
    public ResourceBundle newBundle
        (String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
            throws IllegalAccessException, InstantiationException, IOException
    {
        // The below is a copy of the default implementation.
        String bundleName = toBundleName(baseName, locale);
        String resourceName = toResourceName(bundleName, "properties");
        ResourceBundle bundle = null;
        InputStream stream = null;
        if (reload) {
            URL url = loader.getResource(resourceName);
            if (url != null) {
                URLConnection connection = url.openConnection();
                if (connection != null) {
                    connection.setUseCaches(false);
                    stream = connection.getInputStream();
                }
            }
        } else {
            stream = loader.getResourceAsStream(resourceName);
        }
        if (stream != null) {
            try {
                // Only this line is changed to make it to read properties files as UTF-8.
                bundle = new PropertyResourceBundle(new InputStreamReader(stream, "UTF-8"));
            } finally {
                stream.close();
            }
        }
        return bundle;
    }
}

Use it as follows:

ResourceBundle bundle = ResourceBundle.getBundle("com.example.i18n.text", new UTF8Control());

This way you don't need to hassle with native2ascii tool and you end up with better maintainable properties files.

See also:

  • Unicode - How to get the characters right?
like image 80
BalusC Avatar answered Nov 20 '22 12:11

BalusC


Take a look at Java Internationalization FAQ. If you've put non ASCII characters in your .properties file, you must convert it using the native2ascii tool. Then everything should work.

like image 30
darioo Avatar answered Nov 20 '22 12:11

darioo


The problem is that the resource bundle properties file is encoded in UTF-8 but your application is loading it using Latin-1.

If you take "LATIN SMALL A WITH DIAERESIS" (E4 in Latin-1 or 0000E4 as a Unicode codepoint) and represent it as UTF-8, you get C3 A4. If you then treat those as Latin-1 bytes you get "LATIN CAPITAL LETTER A WITH TILDE" and the square "CURRENCY SIGN" character ... which is how the characters are showing in your screenshot of the button!!

(Incidentally, here's a neologism for the mangling you get as a result of using the wrong character encoding ... mojibake. Baffle your friends by using it in conversation.)

like image 33
Stephen C Avatar answered Nov 20 '22 11:11

Stephen C