Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any java library for converting document from pdf to html?

Tags:

java

html

pdf

Open source implementation will be preferred.

like image 209
broundee Avatar asked Dec 11 '08 10:12

broundee


1 Answers

Obviously, it isn't an easy task, PDF formatting is much richer than HTML's one (plus you must extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial...).
I see in the sidebar of your question a similar question: Converting PDF to HTML with Python which points to a library (poppler, which is apparently written in C++, perhaps can be accessed with JNI/JNA) and to a related question which offers even more answers.

like image 103
PhiLho Avatar answered Nov 05 '22 16:11

PhiLho