Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting doc, docx, pdf to HTML using PHP linux

Tags:

linux

php

pdf

docx

doc

i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.

does anyone have any ideas on how to easily convert doc/pdf to HTML?

like image 711
sam Avatar asked May 13 '11 20:05

sam


3 Answers

The only thing i can think of is FPDF. It is intended for creating PDF files in PHP but it can also open PDF files. Maybe you can use that as a base and develop some sort of toHTML function for it.

It is completely free to use and it has some extensions already. It MIGHT help you.

http://www.fpdf.org

EDIT: Thanks for the addition to my post in the comments to Pierre:

You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.

I havent taken a look at it myself so far but this might help.

like image 79
Ch33f Avatar answered Sep 16 '22 20:09

Ch33f


As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.

like image 36
tmo Avatar answered Sep 20 '22 20:09

tmo


There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)

http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.

Or, you could install a portable version of libreoffice on your server which allows command line conversion https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters

I'm sure there'll be tutorials out there (on libreoffice support area)

like image 32
James Avatar answered Sep 19 '22 20:09

James