Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

to read unicode character in java

Tags:

java

unicode

i am trying to read Unicode characters from a text file saved in utf-8 using java my text file is as follows

अ, अदेबानि ,अन, अनसुला, अनसुलि, अनफावरि, अनजालु, अनद्ला, अमा, अर, अरगा, अरगे, अरन, अराय, अलखद, असे, अहा, अहिंसा, अग्रं, अन्थाइ, अफ्रि, बियन, खियन, फियन, बन, गन, थन, हर, हम, जम, गल, गथ, दरसे, दरनै, थनै, थथाम, सथाम, खफ, गल, गथ, मिख, जथ, जाथ, थाथ, दद, देख, न, नेथ, बर, बुंथ, बिथ, बिख, बेल, मम, आ, आइ, आउ, आगदा, आगसिर

i have tried with the code as followed

import java.io.*;
import java.util.*;
import java.lang.*;
public class UcharRead
{
    public static void main(String args[])
    {
        try
        {
            String str;
            BufferedReader bufReader = new BufferedReader( new InputStreamReader(new FileInputStream("research_words.txt"), "UTF-8"));
            while((str=bufReader.readLine())!=null)
            {
                System.out.println(str);
            }
        }
        catch(Exception e)
        {
        }
    }
}

getting out put as ???????????????????????? can anyone help me

like image 663
purnendu Avatar asked Sep 11 '13 05:09

purnendu


2 Answers

You are (most likely) reading the text correctly, but when you write it out, you also need to enable UTF-8. Otherwise every character that cannot be printed in your default encoding will be turned into question marks.

Try writing it to a File instead of System.out (and specify the proper encoding):

Writer w = new OutputStreamWriter(
   new FileOutputStream("x.txt"), "UTF-8");
like image 170
Thilo Avatar answered Nov 03 '22 05:11

Thilo


If you are reading the text properly using UTF-8 encoding then make sure that your console also supports UTF-8. In case you are using eclipse then you can enable UTF-8 encoding foryour console by:

Run Configuration->Common -> Encoding -> Select UTF 8

Here is the eclipse screenshot.

enter image description here

like image 36
Juned Ahsan Avatar answered Nov 03 '22 04:11

Juned Ahsan