Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to read bullets from RTF file

Tags:

java

rtf

I have a rtf file which has some text with bullets as shown in the screenshot below

enter image description here

I want to extract the data along with the bullets but when I print in the console, I get junk values. How do I print exactly the same from console. The way I tried is as below

public static void main(String[] args) throws IOException, BadLocationException {
    RTFEditorKit rtf = new RTFEditorKit();
    Document doc = rtf.createDefaultDocument();

    FileInputStream fis = new FileInputStream("C:\\Users\\Guest\\Desktop\\abc.rtf");
    InputStreamReader i =new InputStreamReader(fis,"UTF-8");
    rtf.read(i,doc,0);
    System.out.println(doc.getText(0,doc.getLength()));
}

Console output:

enter image description here

I assumed junk values are due to console not supporting chareset so I tried to generate a pdf file but in pdf also I get the same junk values. this is the pdf code

Paragraph de=new Paragraph();
            Phrase pde=new Phrase();
            pde.add(new Chunk(getText("C:\\Users\\Guest\\Desktop\\abc.rtf"),smallNormal_11));
            de.add(pde);

            de.getFont().setStyle(BaseFont.IDENTITY_H);
            document.add(de);
public static String getText() throws IOException, BadLocationException {
        RTFEditorKit rtf = new RTFEditorKit();
        Document doc = rtf.createDefaultDocument();

        FileInputStream fis = new FileInputStream("C:\\Users\\Guest\\Desktop\\abc.rtf");
        InputStreamReader i =new InputStreamReader(fis,"UTF-8");
        rtf.read(i,doc,0);
        String output=doc.getText(0,doc.getLength());
return output;
    }
like image 774
rocking Avatar asked Nov 15 '16 18:11

rocking


1 Answers

Despite what you said, my guess is that it is a console encoding problem.

Anyway you can easily check it:

Just replace this line:

    System.out.println(doc.getText(0,doc.getLength()));

With these 2 lines :

    PrintStream ps = new PrintStream(System.out, true, "UTF-8");
    ps.println(doc.getText(0,doc.getLength()));

This will force console encoding to UTF-8.

If it is still wrong, I would suspect your file is not fully rtf-compliant.


I made some tests and your code works well (the console one, I did not try the pdf) under Linux, but the console is natively in UTF-8.

like image 200
Benoit Avatar answered Nov 13 '22 23:11

Benoit