Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert PDF to HTML in PHP?

Tags:

php

I want to be able to convert a PDF file to an HTML file via PHP, but am running into some trouble.

I found a basic way to do this using Saaspose, which lets you convert PDF's to HTML files. There are some problems with this, however, such as the use of SVGs, images, positioning, fonts, etc.

All I would need is the ability to grab the text from the PHP file and any images associated with it, and then display it in a linear format as opposed to it being formatted with absolute positioning.

What I mean by this is that if the PDF looks like this:

enter image description here

I'd want to convert it to a single column design HTML file. If there were images, I'd want them returned as well.

Is this possible in PHP? I know I can simply grab the text from the PDF file, but what about grabbing images as well?

Another problem is that I want everything to be inline, as it's being served to the client in a single file. Currently, I can do this with my setup through some code:

for ($i = 0; $i < $object_number; $i++) {
                $object = $html->find("object")->find("embed")->eq($i);
                $embed = file_get_contents("Output/OutputHtml/" . $object->attr("src"));
                array_push($converted_obj, $embed);
                array_push($original_obj, $object);
            }

            for ($i = 0; $i < $object_number; $i++){
                pq($original_obj[$i])->replaceWith($converted_obj[$i]);
            }

Which grabs all the SVG files and displays them inline. Images would be easier for this, as I could use base64.

like image 396
Charlie Avatar asked Feb 08 '13 23:02

Charlie


People also ask

Can you convert PHP to HTML?

PHP can be converted to HTML with the usage of a simple scripting tool that is written in Python. Converting PHP to HTML involves the conversion of PHP code scripts to static HTML pages. With this, an entire PHP website can be converted to a static HTML website while residing in localhost.

Can PHP read PDF?

Note: PHP is not actually reading the PDF file. It does not recognize File as pdf. It only passes the PDF file to the browser to be read there.


1 Answers

Cross-platform solution using Xpdf:

Download appropriate package of the Xpdf tools and unpack it into a subdirectory in your script's directory. Let's assume it's called "/xpdftools".

Add such a code into your php script:

$pdf_file = 'sample.pdf';
$html_dir = 'htmldir';
$cmd = "xpdftools/bin32/pdftohtml $pdf_file $html_dir";

exec($cmd, $out, $ret);
echo "Exit code: $ret";

After successful script execution htmldir directory will contain converted html files (each page in a separate file).

The Xpdf tools use the following exit codes:

  • 0 - No error.
  • 1 - Error opening a PDF file.
  • 2 - Error opening an output file.
  • 3 - Error related to PDF permissions.
  • 99 - Other error.
like image 93
hindmost Avatar answered Sep 22 '22 16:09

hindmost