Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting from PDF to HTML [closed]

Tags:

Is there a .dll I can use which uses a PDF file as an input and HTML file as an output? I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?

like image 538
petko_stankoski Avatar asked Nov 14 '11 15:11

petko_stankoski


People also ask

How do I get PDF to open in browser HTML?

Click "Enable" to set Chrome PDF Viewer as the default PDF viewer on Google Chrome. Step 3: Right click on your document. Navigate to the "Open With" option and choose "Chrome PDF Viewer" from the drop-down menu. You can also drag a PDF document directly into the browser, and it will open.


1 Answers

Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).

UPDATE

As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.

like image 104
Icarus Avatar answered Sep 25 '22 07:09

Icarus