Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to show arabic characters in Java

I am trying to show arabic characters in a Java applet but I always get Questions marks '?????'.

I tried many solutions with no success:

  • Using new String(bytes [], charsetName) to UTF-8 decode.
  • Changing default Charset in Netbeans: -Dfile.encoding=UTF8 in VM options and -encoding UTF8 in compiling options.
  • Using ByteArrayOutputStream for encoding.
  • Using both UTF8 and UTF-8 charset names.

I am using Windows 7 in a spanish language environment.

Some solutions work when running Netbeans, but they do not work outside this environment. Here it is Netbeans project with sources and .jar.

This is simple code I am using:

package javaapplication4;

import java.io.ByteArrayOutputStream;
import java.nio.charset.Charset;
import javax.swing.JApplet;
import javax.swing.JOptionPane;

public class JavaApplication4 extends JApplet{

@Override
public void init(){
    try {

        String str1 = new String("تعطي يونيكود رقما فريدا لكل حرف".getBytes(), "UTF-8");
        JOptionPane.showMessageDialog(rootPane, str1);

        String str2 = new String("تعطي يونيكود رقما فر");  
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        os.write(str2.getBytes());
        JOptionPane.showMessageDialog(rootPane, os.toString("UTF-8"));

    } catch (Exception ex) {
        JOptionPane.showMessageDialog(rootPane, ex.toString());
    }
}
}

Any idea of what is happening?

like image 723
J punto Marcos Avatar asked Feb 21 '13 09:02

J punto Marcos


People also ask

Does UTF-16 support Arabic?

All Arabic characters can be encoded using a single UTF-16 code unit (2 bytes), but they may take either 2 or 3 UTF-8 code units (1 byte each), so if you were just encoding Arabic, UTF-16 would be a more space efficient option.

Does Java use UTF-8 or UTF-16?

The native character encoding of the Java programming language is UTF-16.

What is encoding of Arabic characters?

In order for the Arabic characters to be displayed in URLs in your browser the characters are encoded into a Latin based encoding called UTF-8 which typically are a 4 character hexadecimal string.


1 Answers

The easiest solution would be using strings normally and changing the default encoding in your workspace for example eclipse.

Windows-->Preferences-->General-->workspace-->Text file encoding

Change the encoding to UTF-8.

There is no magic here.

like image 143
Yishagerew Avatar answered Sep 28 '22 03:09

Yishagerew