I want to be able to convert a PDF file to an HTML file via PHP, but am running into some trouble.
I found a basic way to do this using Saaspose, which lets you convert PDF's to HTML files. There are some problems with this, however, such as the use of SVGs, images, positioning, fonts, etc.
All I would need is the ability to grab the text from the PHP file and any images associated with it, and then display it in a linear format as opposed to it being formatted with absolute positioning.
What I mean by this is that if the PDF looks like this:
I'd want to convert it to a single column design HTML file. If there were images, I'd want them returned as well.
Is this possible in PHP? I know I can simply grab the text from the PDF file, but what about grabbing images as well?
Another problem is that I want everything to be inline, as it's being served to the client in a single file. Currently, I can do this with my setup through some code:
for ($i = 0; $i < $object_number; $i++) {
$object = $html->find("object")->find("embed")->eq($i);
$embed = file_get_contents("Output/OutputHtml/" . $object->attr("src"));
array_push($converted_obj, $embed);
array_push($original_obj, $object);
}
for ($i = 0; $i < $object_number; $i++){
pq($original_obj[$i])->replaceWith($converted_obj[$i]);
}
Which grabs all the SVG
files and displays them inline. Images would be easier for this, as I could use base64
.
PHP can be converted to HTML with the usage of a simple scripting tool that is written in Python. Converting PHP to HTML involves the conversion of PHP code scripts to static HTML pages. With this, an entire PHP website can be converted to a static HTML website while residing in localhost.
Note: PHP is not actually reading the PDF file. It does not recognize File as pdf. It only passes the PDF file to the browser to be read there.
Cross-platform solution using Xpdf:
Download appropriate package of the Xpdf tools and unpack it into a subdirectory in your script's directory. Let's assume it's called "/xpdftools".
Add such a code into your php script:
$pdf_file = 'sample.pdf';
$html_dir = 'htmldir';
$cmd = "xpdftools/bin32/pdftohtml $pdf_file $html_dir";
exec($cmd, $out, $ret);
echo "Exit code: $ret";
After successful script execution htmldir
directory will contain converted html files (each page in a separate file).
The Xpdf tools use the following exit codes:
0
- No error.1
- Error opening a PDF file.2
- Error opening an output file.3
- Error related to PDF permissions.99
- Other error.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With