I am trying to find out whether there exist anything in the word document that has a font of 2. However, I have not been able to do this. To begin with, I've tried to read the font of each word in a sample word document that only has one line and 7 words. I am not getting the correct results.
Here is my code:
HWPFDocument doc = new HWPFDocument (fileStream);
WordExtractor we = new WordExtractor(doc);
Range range = doc.getRange()
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
  Paragraph pr = range.getParagraph(i);
  int k = 0
  while (true) {
     CharacterRun run = pr.getCharacterRun(k++);
     System.out.println("Color: " + run.getColor());
     System.out.println("Font: " + run.getFontName());
     System.out.println("Font Size: " + run.getFontSize());
     if (run.getEndOffSet() == pr.getEndOffSet())
       break;
  }
}
However, the above code always doubles the font size. i.e. if the actual font size in the document is 12 then it outputs 24 and if actual font is 8 then it outputs 16.
Is this the correct way to read font size from a word document ??
Yes, that's the correct way; the measurement is in half points.
In a docx, you'd have something like:
<w:rPr>
  <w:sz w:val="28" /> 
</w:rPr>
ECMA 376 spec on @sz defines the unit as ST_HpsMeasure (Measurement in Half-Points)
Its the same with the binary doc format, which HWPF supports. If you look at [MS-DOC], you'll see it also specifies the size of text in half-points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With