Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reference - How do I handle Namespaces (Tags and Attributes with a Colon in their Name) in SimpleXML?

This question is intended as a reference to answer a particularly common question, which might take different forms:

  • I have an XML document which contains multiple namespaces; how do I parse it with SimpleXML?
  • My XML has a colon (":") in the tag name, how do I access it with SimpleXML?
  • How do I access attributes in my XML file when they have a colon in their name?

If your question has been closed as a duplicate of this, it may not be identical to these examples, but this page should tell you what you need to know.

Here is an illustrative example:

$xml = 
    <<<XML
    <?xml version="1.0" encoding="utf-8"?>
    <document xmlns="http://example.com" xmlns:ns2="https://namespaces.example.org/two" xmlns:seq="urn:example:sequences">
        <list type="short">
            <ns2:item seq:position="1">A thing</ns2:item>
            <ns2:item seq:position="2">Another thing</ns2:item>
        </list>
    </document>
    XML;
$sx = simplexml_load_string($xml);

This code will not work; why not?

foreach ( $sx->list->ns2:item as $item ) {
    echo 'Position: ' . $item['seq:position'] . "\n";
    echo 'Item: ' . (string)$item . "\n";
}

The first problem is that ->ns2:item is invalid syntax; but changing it to this doesn't work either:

foreach ( $sx->list->{'ns2:item'} as $item ) { ... }

Why not, and what should you use instead?

like image 384
IMSoP Avatar asked Jul 03 '17 22:07

IMSoP


People also ask

What is the correct syntax for namespace attribute?

XML Namespaces - The xmlns Attribute The namespace declaration has the following syntax. xmlns:prefix="URI". In the example above: The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.

Which attribute is used to define namespace?

In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a namespace.

What are namespace tags?

A Namespace is a set of unique names. Namespace is a mechanisms by which element and attribute name can be assigned to a group. The Namespace is identified by URI(Uniform Resource Identifiers).

Can XML attributes have namespaces?

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.


2 Answers

What are XML namespaces?

A colon (:) in a tag or attribute name means that the element or attribute is in an XML namespace. Namespaces are a way of combining different XML formats / standards in one document, and keeping track of which names come from which format. The colon, and the part before it, aren't really part of the tag / attribute name, they just indicate which namespace it's in.

An XML namespace has a namespace identifier, which is identified by a URI (a URL or URN). The URI doesn't point at anything, it's just a way for someone to "own" the namespace. For instance, the SOAP standard uses the namespace http://www.w3.org/2003/05/soap-envelope and an OpenDocument file uses (among others) urn:oasis:names:tc:opendocument:xmlns:meta:1.0. The example in the question uses the namespaces http://example.com and https://namespaces.example.org/two.

Within a document, or a section of a document, a namespace is given a local prefix, which is the part you see before the colon. For instance, in different documents, the SOAP namespace might be given the local prefix soap:, SOAP:, SOAP-ENV:, env:, or just ns1:. These names are linked back to the identifier of the namespace using a special xmlns attribute, e.g. xmlns:soap="http://www.w3.org/2003/05/soap-envelope". The choice of prefix in a particular document is completely arbitrary, and could change each time it was generated without changing the meaning.

Finally, there is a default namespace in each document, or section of a document, which is the namespace used for elements with no prefix. It is defined by an xmlns attribute with no :, e.g. xmlns="http://www.w3.org/2003/05/soap-envelope". In the example above, <list> is in the default namespace, which is defined as http://example.com.

Somewhat peculiarly, un-prefixed attributes are never in the default namespace, but in a kind of "void namespace", which the standard doesn't clearly define. See: XML Namespaces and Unprefixed Attributes

SimpleXML gives me an empty object; what's wrong?

If you use print_r, var_dump, or similar "dump structure" functions on a SimpleXML object with namespaces in, some of the contents will not display. It is still there, and can be accessed as described below.

How do you access namespaces in SimpleXML?

SimpleXML provides two main methods for using namespaces:

  • The ->children() method allows you to access child elements in a particular namespace. It effectively switches your object to look at that namespace, until you call it again to switch back, or to another namespace.
  • The ->attributes() method works in a similar way, but allows you to access attributes in a particular namespace.

For instance, the example above might become:

define('XMLNS_EG1', 'http://example.com');
define('XMLNS_EG2', 'https://namespaces.example.org/two');
define('XMLNS_SEQ', 'urn:example:sequences');

foreach ( $sx->children(XMLNS_EG1)->list->children(XMLNS_EG2)->item as $item ) {
    echo 'Position: ' . $item->attributes(XMLNS_SEQ)->position . "\n";
    echo 'Item: ' . (string)$item . "\n";
}

You can also select the initial namespace when you first parse the XML, using the $namespace_or_prefix parameter, which is the fourth parameter to simplexml_load_string, simplexml_load_file, or new SimpleXMLElement.

For instance, if we created the object this way, we wouldn't need the ->children(XMLNS_EG1) call to access the list element:

$sx = simplexml_load_string($xml, null, 0, XMLNS_EG1);

(Note that if the root element uses a default namespace rather than a prefix, SimpleXML will select it automatically; but since you can't predict which namespace will be the default in future, it's best to always include the $namespace_or_prefix parameter or initial ->children() call.)

Short-hand (not recommended)

As a short-hand, you can also pass the methods the local alias of the namespace, by giving the second parameter as true. Remember that this prefix could change at any time, for instance, a generator might assign prefixes ns1, ns2, etc, and assign them in a different order if the code changes slightly. Relying on the full namespace URIs is always the best approach.

Using this short-hand, the code would become:

foreach ( $sx->list->children('ns2', true)->item as $item ) {
    echo 'Position: ' . $item->attributes('seq', true)->position . "\n";
    echo 'Item: ' . (string)$item . "\n";
}

(This short-hand was added in PHP 5.2, and you may see really old examples using a more long-winded version using $sx->getNamespaces to get a list of prefix-identifier pairs. This is the worst of both worlds, as you're still hard-coding the prefix rather than the identifier.)

like image 87
7 revs Avatar answered Oct 27 '22 07:10

7 revs


Using Namespaces with XPath

SimpleXML has an xpath() method which allows you to search an element with XPath 1.0 syntax. To access namespaced nodes, you have to choose your own prefixes by calling the registerXPathNamespace() method.

Remember that even if an element doesn't have a prefix and a colon, it can be in a "default namespace" declared with xmlns.

For example:

define('XMLNS_EG2', 'https://namespaces.example.org/two');
define('XMLNS_SEQ', 'urn:example:sequences');

$sx->registerXPathNamespace('EG2', XMLNS_EG2);
$sx->registerXPathNamespace('SEQ', XMLNS_SEQ);
foreach ( $sx->xpath('//EG2:item[@SEQ:position=2]') as $item ) {
    echo 'Item: ' . (string)$item . "\n";
}

Note that the prefix you choose does not need to match what's used in the XML, it is your local alias for the namespaces you're interested in.

Note also that registerXPathNamespace has no effect on anything other than the xpath method. If you are not using XPath, you need to use children() and attributes() as discussed elsewhere on this page.

Limitations

  • XPath 1.0 doesn't have a notion of "default namespace" (and libxml2, the XML library SimpleXML is based on, doesn't support XPath 2.0), so you have to use the prefix notation on every element and attribute name you want to match.
  • The registered namespaces have to be registered on the specific object you're going to call xpath() on and are not inherited or copied to other objects. If you want to search based on different starting points, you'll have to run registerXPathNamespace every time.
like image 37
2 revs Avatar answered Oct 27 '22 07:10

2 revs