Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem editing word file in PHP

So I need to edit some text in a Word document. I created a Word document and saved it as XML. It is saved correctly (I can open the XML file in MS Word and it looks exactly like the docx original).

So then I use PHP DOM to edit some text in the file (just two lines) (EDIT - bellow is already fixed working version):

<?php

$firstName = 'Richard';
$lastName = 'Knop';

$xml = file_get_contents('template.xml');

$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->preserveWhiteSpace = false;

$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 't');

$c1 = 0; $c2 = 0;
foreach ($wts as $wt) {

    if (1 === $c1) {
        $wt->nodeValue .= ' ' . $firstName;
        $c1++;
    }

    if (1 === $c2) {
        $wt->nodeValue .= ' ' . $lastName;
        $c2++;
    }

    if ('First Name' === substr($wt->nodeValue, 0, 10)) {
        $c1++;
    }

    if ('Last Name' === substr($wt->nodeValue, 0, 9)) {
        $c2++;
    }

}

$xml = str_replace("\n", "\r\n", $xml); 

$fp = fopen('final-xml.xml', 'w');
fwrite($fp, $xml);
fclose($fp);

This gets executed properly (no errors). These two lines:

<w:t>First Name:</w:t>
<w:t>Last Name:</w:t>

Get replaced with these:

<w:t>First Name: Richard</w:t>
<w:t>Last Name: Knop</w:t>

However, when I try to open the final-xml.xml file in MS Word, it doesn't open (Word freezes). Any suggestions.

EDIT:

I tried using levenstein():

$xml = file_get_contents('template.xml');
$xml2 = file_get_contents('final-xml.xml');

$str = str_split($xml, 255);
$str2 = str_split($xml2, 255);

$i = 0;
foreach ($str as $s) {
    $dist = levenshtein($s, $str2[$i]);
    if (0 <> $dist) {
        echo $dist, '<br />';
    }
    $i++;
}

Which outputted nothing.

Which is weird. When I open the final-xml.xml file in notepad, I can clearly see that those two lines have changed.

EDIT2:

Here is the template.xml file: http://uploading.com/files/61b2922b/template.xml/

like image 710
Richard Knop Avatar asked Jul 19 '10 07:07

Richard Knop


1 Answers

This is a problem related to DOS vs UNIX line endings. Word 2007 does not tolerate a \n line ending, it requires \r\n whereas Word 2010 is more tolerant and accepts both versions.

To fix the problem make sure that you replace all UNIX line breaks with DOS ones before saving the output file:

$xml = str_replace("\n", "\r\n", $xml); 

Full sample:

<?php

$firstName = 'Richard';
$lastName = 'Knop';

$xml = file_get_contents('template.xml');

$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->preserveWhiteSpace = false;

$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 't');

foreach ($wts as $wt) {
   echo $wt->nodeValue;

    if ('First Name:' === $wt->nodeValue) {
        $wt->nodeValue = 'First Name: ' . $firstName;
    }

    if ('Last Name:' === substr($wt->nodeValue, 0, 10)) {
        $wt->nodeValue = 'Last Name: ' . $lastName;
    }
}

$xml = $doc->saveXML();

// Replace UNIX with DOS line endings
$xml = str_replace("\n", "\r\n", $xml); 

$fp = fopen('final-xml.xml', 'w');
fwrite($fp, $xml);
fclose($fp);
?>
like image 172
Dirk Vollmar Avatar answered Oct 22 '22 11:10

Dirk Vollmar