Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert a docx document to html using php?

Tags:

html

php

docx

I want to be able to upload an MS word document and export it a page in my site.

Is there any way to accomplish this?

like image 741
xun Avatar asked Jan 03 '11 18:01

xun


2 Answers

//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
    // Create new ZIP archive
    $zip = new ZipArchive;
    $dataFile = 'word/document.xml';
    // Open received archive file
    if (true === $zip->open($filePath)) {
        // If done, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // If found, read it to the string
            $data = $zip->getFromIndex($index);
            // Close archive file
            $zip->close();
            // Load XML from a string
            // Skip errors and warnings
            $xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
            // Return data without XML formatting tags

            $contents = explode('\n',strip_tags($xml->saveXML()));
            $text = '';
            foreach($contents as $i=>$content) {
                $text .= $contents[$i];
            }
            return $text;
        }
        $zip->close();
    }
    // In case of failure return empty string
    return "";
}

ZipArchive and DOMDocument are both inside PHP so you don't need to install/include/require additional libraries.

like image 106
David Lin Avatar answered Oct 19 '22 15:10

David Lin


One may use PHPDocX.

It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML.

The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:

$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));

If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.

You may also extract selected parts of your HTML using 'JQuery type' selectors.

like image 40
Eduardo Avatar answered Oct 19 '22 14:10

Eduardo