Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error converting .docx file (with .emf image background) to PDF

A Laravel-based application is converting documents (.doc, .docx, .pdf, .png, .otd, html, etc) to PDF so that they can all be merged together into a master PDF document. It is using a combination of plugins like PHPWord and DOMPDF Wrapper to do the file loading and creation. Every once and awhile, the process encounters an error due to a Word file.

ERROR: PhpOffice\PhpWord\Exception\InvalidImageException: Invalid image: zip:// ... #word/media/image2.emf

The error is caused by an image background within the document that acts like a watermark. The PHPWord part that errors out is the PhpOffice\PhpWord\Element\Image->checkImage() method, but happens when the file is trying to be loaded.

Settings::setPdfRendererName(Settings::PDF_RENDERER_DOMPDF);
$pdfWord = IOFactory::load(storage_path() . '/app/uploads/randomfile.docx', 'Word2007');

How can the application convert a Word document, with an EMF image embedded, to a PDF?

For more code/info on how to recreate the error, a few issues exist in the Github PHPWord library.

  1. Support EMF image #1480
  2. Read docx error when contains image from remote url #1173

The environment-related information:

  • Server: Windows / IIS
  • PHP: 7.2.11
  • Laravel: 5.7.15
  • PHPWord: 0.15.0

EDIT: I also tried to come at this from a different angle, to no avail. I tried using PHP's ZipArchive to unzip the docx file, remove the emf image from the document (ZipArchive::deleteName()), remove the reference to the emf image in the [Content_Types].xml (ZipArchive::getFromName()), then zip the docx file back up but that did not work. I can open the new docx file and see that the image is gone, but the PHPWord error still persists in the application.

like image 709
cfnerd Avatar asked Dec 05 '18 20:12

cfnerd


1 Answers

It looks like PHPWord has a feature request open to solve this issue.

https://github.com/PHPOffice/PHPWord/issues/1480

I think you're on the right path with the file alteration, there is probably a reference to the image you are missing somewhere that PHPWord is still trying to access.

I would unzip the file on your local drive and grep (search the contents of the exploded file) the directory for the file you are looking for. This will show you where else you may need to remove it from being referenced in the file.

like image 102
Morris Buel Avatar answered Oct 24 '22 01:10

Morris Buel