I am trying to display the glyph corresponding to unicode 0x95E8. This codepoint is basically of CJK block (chinese, Japanese, Korean).
I am struggling to know if the glyph representation of this particular codepoint can be different for Japanese and Chinese.
When I am displaying this U+95E8 in a JTextArea, i am able to see "门" character on linux/windows. But when I am trying to display the same codepoint in my "embedded device". the displayed character changes to.
I want to know if this codepoint U+95E8 should have uniform representation in all the CJK (Chinese, Japanese, Korean) locales or is different for some of them. Can this kind of manifestation be because of different kind of font installed in different devices? I am sorry for my ignorance but I am not too much into internationalization.
import java.awt.*;
import java.awt.event.*;
import java.util.Locale;
import javax.swing.*;
public class TextDemo extends JPanel implements ActionListener {
public TextDemo() {
}
public void actionPerformed(ActionEvent evt) {
}
/**
* Create the GUI and show it. For thread safety,
* this method should be invoked from the
* event dispatch thread.
* @throws InterruptedException
*/
private static void createAndShowGUI() throws InterruptedException {
JFrame frame = new JFrame(java.util.Locale.getDefault().getDisplayName());
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
Container contentPane = frame.getContentPane();
contentPane.setLayout(new SpringLayout());
Dimension size = new Dimension(500, 500);
frame.setSize(size);
JTextArea textArea = new JTextArea();
//Font font1 = new Font("SansSerif", Font.BOLD, 20);
//textArea.setFont(font1);
textArea.setEditable(true);
textArea.setSize(new Dimension(400,400));
textArea.setDefaultLocale(java.util.Locale.SIMPLIFIED_CHINESE);
textArea.setText("Printing U+95E8 : \u95e8");
contentPane.add(textArea);
frame.setVisible(true);
}
public static void main (String[] args) {
java.util.Locale.setDefault(java.util.Locale.JAPANESE);
javax.swing.SwingUtilities.invokeLater(new Runnable() {
public void run() {
try {
createAndShowGUI();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
}
}
The Unicode Standard contains a set of unified Han ideographic characters used in the written Chinese, Japanese, and Korean languages. The term Han, derived from the Chi- nese Han Dynasty, refers generally to Chinese traditional culture.
Character encodings. There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult.
The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system and hanja, whose use is diminishing in Korea.
Unicode is a 16-bit code designed to support international languages that have less characters to be represented by ASCII or EBCDIC codes.
Generally, CJK characters in Unicode are “unified”, which means that a single code point is used even though the character has traditionally been somewhat different for the different languages. In theory, a single font can contain multiple glyphs for a code point, with some selection mechanism. In practice, a font that contains CJK characters typically has a single design for them, reflecting the design of Traditional Chinese, Simplified Chinese, Japanese, or Korean. In this sense, some fonts might be called “Traditional Chinese”, “Japanese”, etc.
Obviously, you should select the font according to the language of the text.
The glyph in the image in the question looks somewhat odd, and it deviates from the glyphs for U+95E8 in some common fonts, which generally show rather similar designs for this character. So for this specific character, the variation can be expected to be only in the general style (e.g., serif vs. sans-serif, stroke width). It seems that the font being used is somehow oddly designed, at least for this character,
Adding to Jukka's answer:
Here is some more info on the "Han unification": http://en.wikipedia.org/wiki/Han_unification
There are two main ways one can render the glyph desired:
Now, the stuff below is very low level. When you use something like JTextArea, you have no control. You use what the implementers of JTextArea decided to do.
You can call the setDefaultLocale of your component, and that might help. It is recommended you do that, no matter what. But if you want to be sure what is going on, you take control and specify a language specific font.
how can I recognize the correct font/environment in my PC that is causing "门" to be printed.
You can't do that reliably. The layers below Java might do their own fallback operations. And you can't legally distribute the Windows fonts.
So that I can install the same font in my embedded device
Don't. Use an open source, good quality font. The Noto fonts are a very good option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With