Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ignore namespace with xpath in php

I want to pull some tags from a xml file. The xml file might be like this:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="de">
[... some more tags ...]
  <page>
    <title>Title 1</title>
    [... some more tags ...]
  </page>
  <page>
    <title>Title 2</title>
    [... some more tags ...]
  </page>
</mediawiki>

When I use https://www.freeformatter.com/xpath-tester.html to pull "//title" everything works and I receive the two titles.

But when I use the following php:

$xml = simplexml_load_file('articles.xml');
$result = $xml->xpath('//title');
var_dump($result);

the resulting array is empty.

I already checked many of the similar questions and found that it would work if I set registerXPathNamespace with the same URL. However, the XMLs I am reading are coming from several external sources with different software (the above is only one possible example). They might change at any time. So every time I open an XML I would need to read out the URL and put it into registerXPathNamespace. Another option to make it work would be to strip the xmlns from the XML. Both options seem to be pretty complicated if all I want to do is to extract the "title" (and some other) tags no matter what the namespace is.

Is there a simple way to tell xpath to ignore the namespace? (And if there is no way to ignore it: what would be the most simple and durable solution to avoid the problem of changing URLs?)

Up to now I am using the hard coded

foreach ($xml->page as $page) {
  $title = $page->title;
  //[... do something ...]
}

which works. But I thought xpath would be handy (more flexible, not hard coded, more durable) and wanted to give it a try.

like image 339
A806 Avatar asked Sep 14 '25 13:09

A806


1 Answers

You can fetch the namespaces from the document and then register the default one from these. It's a bit of a pain as the default namespace ends up with a blank key, but this is why it's a bit of a fudge to get the first value from the array and then use this.

So the code is something like:

$xml = simplexml_load_file('articles.xml');
$ns = $xml->getDocNamespaces();
$xml->registerXPathNamespace('def', array_values($ns)[0]);
$result = $xml->xpath('//def:title');
var_dump($result);
like image 199
Nigel Ren Avatar answered Sep 17 '25 04:09

Nigel Ren