Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP XMLReader read , edit Node , write XMLWriter

I have an XML file which is very very large (millions of records). Due to speed and memory constraints I plan to use XMLReader/XMLWriter.

I need to read the file, getting one record, change its attribute, and finally save XML again.

For testing I created an XML file and write some records into it using these lines:

$doc = new XMLWriter();  
$doc->openURI($xmlFile);  
$doc->startDocument('1.0','UTF-8');  
$doc->setIndent(4);   
$doc->startElement('DBOS'); 
for($r=0;$r<10; $r++){
    $doc->startElement('ITEMS');
    for($i=0;$i<5; $i++){
        $doc->startElement('ITEM');  
        $doc->writeAttribute('id', $r.'-'.$i);
        $doc->endElement();
    }
    $doc->endElement();
}
$doc->endElement();  
$doc->endDocument();   
$doc->flush();

I read it again using this:

$reader = new XMLReader();
if (!$reader->open($xmlFile)){
    die("Failed to open 'data.xml'");
}
while($reader->read()){
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'ITEMS') {
        $node = $reader->expand();
        $items = $node->childNodes;
        foreach ($items as $ik => $itm ){
            print $itm->textContent.'<br/>';
            // how to change the ID Attribute of a Node (DomNode) and save changes to the original XML File 
        }
        break;
    }
}
$reader->close();

My question: How to change the id attribute of a DomNode and save changes to the original XML File using XMLWriter again?

like image 244
Rami.Q Avatar asked Jan 11 '14 16:01

Rami.Q


1 Answers

How to change the id attribute of a DomNode and save changes to the original XML File using XMLWriter again?

This does not work that way. If you use XMLReader and XMLWriter to operate on the same file simultaneously, the file will be truncated by the writer and the reader will spit errors and stop working.

However, you can operate on different files.

So what you can do is to use an XMLReader to read the document and while you operate on it use XMLWriter to write to another document based on what you've read and occasionally modified. After you're done, you can then rename the newly written file to the old filename.

Example

For an XML document (shortened for the example, XMLReader and XMLWriter make naturally sense with really huge documents) like this one modeled a bit after your question:

<DBOS>
    <ITEMS>
        <ITEM>item #1</ITEM>
        <ITEM>item #2</ITEM>
        <ITEM>item #3</ITEM>
    </ITEMS>
    <ITEMS>
        <ITEM>item #4</ITEM>
        <ITEM>item #5</ITEM>
    </ITEMS>
</DBOS>

A working code-example is:

<?php
/*
 * This file is part of the XMLReaderIterator package.
 *
 * Copyright (C) 2012, 2014 hakre <http://hakre.wordpress.com>
 *
 * Example: Write XML with XMLWriter while reading from XMLReader with XMLWriterIteration
 */

require('xmlreader-iterators.php'); // require XMLReaderIterator library

$xmlInputFile  = 'data/dobs-items.xml';
$xmlOutputFile = 'php://output';

$reader = new XMLReader();
$reader->open($xmlInputFile);

$writer = new XMLWriter();
$writer->openUri($xmlOutputFile);

$iterator = new XMLWritingIteration($writer, $reader);

$writer->startDocument();

$itemsCount = 0;
$itemCount  = 0;
foreach ($iterator as $node) {
    $isElement = $node->nodeType === XMLReader::ELEMENT;


    if ($isElement && $node->name === 'ITEMS') {
        // increase counter for <ITEMS> elements and reset <ITEM> counter
        $itemsCount++;
        $itemCount = 0;
    }

    if ($isElement && $node->name === 'ITEM') {
        // increase <ITEM> counter and insert "id" attribute
        $itemCount++;
        $writer->startElement($node->name);
        $writer->writeAttribute('id', $itemsCount . "-" . $itemCount);
        if ($node->isEmptyElement) {
            $writer->endElement();
        }
    } else {
        // handle everything else
        $iterator->write();
    }
}

$writer->endDocument();

The output then is (exemplary to standard output, any valid PHP file-name can be used):

<?xml version="1.0"?>
<DBOS>
    <ITEMS>
        <ITEM id="1-1">item #1</ITEM>
        <ITEM id="1-2">item #2</ITEM>
        <ITEM id="1-3">item #3</ITEM>
    </ITEMS>
    <ITEMS>
        <ITEM id="2-1">item #4</ITEM>
        <ITEM id="2-2">item #5</ITEM>
    </ITEMS>
</DBOS>

As this example shows, the id attributes are added based on the numbering by the different counter variables.

The XMLWritingIteration makes this easy as it deals with all other nodes and cases thanks to $iterator->write().

The example and code is part of the XMLReaderIterator package. There is also another example that is creating a DOMDocument based on XMLReader, it is part of an answer to "How to distinguish between empty element and null-size string in DOMDocument?".

like image 54
hakre Avatar answered Sep 25 '22 02:09

hakre