Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different representation of unicode code points in Japanese and chinese

I am trying to display the glyph corresponding to unicode 0x95E8. This codepoint is basically of CJK block (chinese, Japanese, Korean).

I am struggling to know if the glyph representation of this particular codepoint can be different for Japanese and Chinese.

When I am displaying this U+95E8 in a JTextArea, i am able to see "门" character on linux/windows. But when I am trying to display the same codepoint in my "embedded device". the displayed character changes to.

japanese_glyph

I want to know if this codepoint U+95E8 should have uniform representation in all the CJK (Chinese, Japanese, Korean) locales or is different for some of them. Can this kind of manifestation be because of different kind of font installed in different devices? I am sorry for my ignorance but I am not too much into internationalization.

import java.awt.*;
import java.awt.event.*;
import java.util.Locale;

import javax.swing.*;

public class TextDemo extends JPanel implements ActionListener {

    public TextDemo() {
    }

    public void actionPerformed(ActionEvent evt) {
    }

    /**
     * Create the GUI and show it.  For thread safety,
     * this method should be invoked from the
     * event dispatch thread.
     * @throws InterruptedException 
     */
    private static void createAndShowGUI() throws InterruptedException {

        JFrame frame = new JFrame(java.util.Locale.getDefault().getDisplayName());

        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        Container contentPane = frame.getContentPane();
        contentPane.setLayout(new SpringLayout());

        Dimension size = new Dimension(500, 500);
        frame.setSize(size);
        JTextArea textArea = new JTextArea();

        //Font font1 = new Font("SansSerif", Font.BOLD, 20);
        //textArea.setFont(font1);

        textArea.setEditable(true);
        textArea.setSize(new Dimension(400,400));
        textArea.setDefaultLocale(java.util.Locale.SIMPLIFIED_CHINESE);

        textArea.setText("Printing U+95E8 : \u95e8");                
        contentPane.add(textArea);        
        frame.setVisible(true);
    }

    public static void main (String[] args) {
        java.util.Locale.setDefault(java.util.Locale.JAPANESE);
        javax.swing.SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                try {
                    createAndShowGUI();
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
        });
    }
}
like image 408
Yogesh Avatar asked Jul 22 '14 18:07

Yogesh


People also ask

Is Chinese a Unicode?

The Unicode Standard contains a set of unified Han ideographic characters used in the written Chinese, Japanese, and Korean languages. The term Han, derived from the Chi- nese Han Dynasty, refers generally to Chinese traditional culture.

Can Unicode be used for Japanese?

Character encodings. There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult.

What is the Unicode range for Chinese characters?

The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system and hanja, whose use is diminishing in Korea.

Is Unicode is a 16 bit code designed to support international languages like Chinese and Japanese?

Unicode is a 16-bit code designed to support international languages that have less characters to be represented by ASCII or EBCDIC codes.


Video Answer


2 Answers

Generally, CJK characters in Unicode are “unified”, which means that a single code point is used even though the character has traditionally been somewhat different for the different languages. In theory, a single font can contain multiple glyphs for a code point, with some selection mechanism. In practice, a font that contains CJK characters typically has a single design for them, reflecting the design of Traditional Chinese, Simplified Chinese, Japanese, or Korean. In this sense, some fonts might be called “Traditional Chinese”, “Japanese”, etc.

Obviously, you should select the font according to the language of the text.

The glyph in the image in the question looks somewhat odd, and it deviates from the glyphs for U+95E8 in some common fonts, which generally show rather similar designs for this character. So for this specific character, the variation can be expected to be only in the general style (e.g., serif vs. sans-serif, stroke width). It seems that the font being used is somehow oddly designed, at least for this character,

like image 108
Jukka K. Korpela Avatar answered Oct 06 '22 17:10

Jukka K. Korpela


Adding to Jukka's answer:

Here is some more info on the "Han unification": http://en.wikipedia.org/wiki/Han_unification

There are two main ways one can render the glyph desired:

  1. Use a locale-specific font (means different fonts for Chinese Traditional, Chinese Simplified, Japanese, Korean). The designers of such fonts take care to do the right thing. This is Jukka's answer. As an example you can take a look at the Noto family of fonts (http://www.google.com/get/noto/cjk.html). Download the "Language specific fonts in OTF" files:
    • The Simplified Chinese font is NotoSansHans-Regular.otf
    • The Traditional Chinese font is NotoSansHant-Regular.otf
    • The Japanese font is NotoSansJP-Regular.otf
    • The Korean font is NotoSansKR-Regular.otf
  2. Use a generic CJK font with multiple locale-speciffic glyphs. As an example you can again use the CJK Noto font, the "Multilingual fonts in OTF" option. See "Script Table and Language System Record" in http://www.microsoft.com/typography/otspec/chapter2.htm. But to use that the font should have the info, the text rendering engine should understand how to deal with the language setting, and the API should expose it.

Now, the stuff below is very low level. When you use something like JTextArea, you have no control. You use what the implementers of JTextArea decided to do.

You can call the setDefaultLocale of your component, and that might help. It is recommended you do that, no matter what. But if you want to be sure what is going on, you take control and specify a language specific font.

how can I recognize the correct font/environment in my PC that is causing "门" to be printed.

You can't do that reliably. The layers below Java might do their own fallback operations. And you can't legally distribute the Windows fonts.

So that I can install the same font in my embedded device

Don't. Use an open source, good quality font. The Noto fonts are a very good option.

like image 38
Mihai Nita Avatar answered Oct 06 '22 19:10

Mihai Nita