Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read font size of each word in a word document using POI?

I am trying to find out whether there exist anything in the word document that has a font of 2. However, I have not been able to do this. To begin with, I've tried to read the font of each word in a sample word document that only has one line and 7 words. I am not getting the correct results.

Here is my code:

HWPFDocument doc = new HWPFDocument (fileStream);
WordExtractor we = new WordExtractor(doc);
Range range = doc.getRange()
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
  Paragraph pr = range.getParagraph(i);
  int k = 0
  while (true) {
     CharacterRun run = pr.getCharacterRun(k++);
     System.out.println("Color: " + run.getColor());
     System.out.println("Font: " + run.getFontName());
     System.out.println("Font Size: " + run.getFontSize());
     if (run.getEndOffSet() == pr.getEndOffSet())
       break;
  }
}

However, the above code always doubles the font size. i.e. if the actual font size in the document is 12 then it outputs 24 and if actual font is 8 then it outputs 16.

Is this the correct way to read font size from a word document ??

like image 921
Anthony Avatar asked Jul 11 '13 03:07

Anthony


1 Answers

Yes, that's the correct way; the measurement is in half points.

In a docx, you'd have something like:

<w:rPr>

  <w:sz w:val="28" /> 

</w:rPr>

ECMA 376 spec on @sz defines the unit as ST_HpsMeasure (Measurement in Half-Points)

Its the same with the binary doc format, which HWPF supports. If you look at [MS-DOC], you'll see it also specifies the size of text in half-points.

like image 149
JasonPlutext Avatar answered Oct 22 '22 14:10

JasonPlutext