How do I read word document with bold and italic formatting by using POI

Question

I am using Apache POI.

I am able to read text from a doc file by using "org.apache.poi.hwpf.extractor.WordExtractor"

Even fetched the tables by using "org.apache.poi.hwpf.usermodel.Table"

But please suggest me, how can I fetch bold/italic formatting of the text.

Thanks in advance.

Gagravarr · Accepted Answer

WordExtractor returns only the text, nothing else.

The simplest way for you to get the text+formatting of a word document is to switch to using Apache Tika. Apache Tika builds on top of Apache POI (amongst others), and offers both plain text extraction and rich extraction (XHTML with formatting).

Alternately, if you want to write the code yourself, I'd suggest you review the code in Tika's WordExtractor, which demonstrates how to use Apache POI to get the formatting information of runs of text out.

Darius Miliauskas · Answer

Instead of using WordExtractor, you can read with Range:

...
HWPFDocument doc = new HWPFDocument(fis);
Range r = doc.getRange();
...

Range is the central class of that model. When you get range, you can play more with the features of the texts and, for instance, iterate through all CharacterRuns, and check if it is Italic (.isItalic()) or change to Italic: (.setItalic(true)).

for(int i = 0; i<r.numCharacterRuns(); i++)
        {
            CharacterRun cr = r.getCharacterRun(i);
            cr.setItalic(true);
            ...
        }

...
File fon = new File(yourFilePathOut);
FileOutputStream fos = new FileOutputStream(fon);
doc.write(fos); 
...

It works if you are stick to use HWPF. Between, to frame into and work with the concept of Paragraph is more convenient.

How do I read word document with bold and italic formatting by using POI

Tags:

apache-poi

doc

italic

bold

hwpf

Sudeep nayak

2 Answers

Gagravarr

Darius Miliauskas

Recent Activity

Donate For Us

How do I read word document with bold and italic formatting by using POI

Tags:

apache-poi

doc

italic

bold

hwpf

Sudeep nayak

2 Answers

Gagravarr

Darius Miliauskas

Related questions

Recent Activity

Donate For Us