Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract font styles of text contents using pdfbox?

Tags:

java

pdfbox

I am using pdfbox library to extract text contents from pdf file.I would able to extract all the text,but couldn't find the method to extract font styles.

like image 390
Master Stroke Avatar asked Aug 04 '11 10:08

Master Stroke


1 Answers

This is not the right way to extract font. To read font one has to iterate through pdf pages and extract font as below:

PDDocument  doc = PDDocument.load("C:/mydoc3.pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
for(PDPage page:pages){
    Map<String,PDFont> pageFonts=page.getResources().getFonts();
}
like image 190
Harpreet Avatar answered Oct 03 '22 15:10

Harpreet