It seems that there is no consistent way that podcasts define their rss feeds. Ran into one that is using different schema defs for the RSS.
What's the best way to scan for xmlnamespace in an RSS url, using XML::LibXML
E.g.
One feed might be
<rss
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">
Another might be
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom">
I want to include in my script an assessment of all the namespaces being used so that when parsing the rss, the appropriate field names can be tracked.
Not sure what that will look like yet, as I'm not sure this module has the capability to do the <rss>
tag attribute atomization that I want.
I'm not sure I understand exactly what kind of output you're looking for, but XML::LibXML
is indeed able to list the namespaces:
use warnings;
use strict;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <<'EOT');
<rss
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">
</rss>
EOT
for my $ns ($dom->documentElement->getNamespaces) {
print $ns->getLocalName(), " / ", $ns->getData(), "\n";
}
Output:
content / http://purl.org/rss/1.0/modules/content/
wfw / http://wellformedweb.org/CommentAPI/
dc / http://purl.org/dc/elements/1.1/
atom / http://www.w3.org/2005/Atom
sy / http://purl.org/rss/1.0/modules/syndication/
slash / http://purl.org/rss/1.0/modules/slash/
I know that OP has already accepted an answer. But for completeness sake it should be mentioned that the recommended way to make searches on the DOM resilient is to use XML::LibXML::XPathContext:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my @examples = (
<<EOT
<rss xmlns:atom="http://www.w3.org/2005/Atom">
<atom:test>One Ring to rule them all,</atom:test>
</rss>
EOT
,
<<EOT
<rss xmlns:a="http://www.w3.org/2005/Atom">
<a:test>One Ring to find them,</a:test>
</rss>
EOT
,
<<EOT
<rss xmlns="http://www.w3.org/2005/Atom">
<test>The end...</test>
</rss>
EOT
,
);
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs('atom', 'http://www.w3.org/2005/Atom');
for my $example (@examples) {
my $dom = XML::LibXML->load_xml(string => $example)
or die "XML: $!\n";
for my $node ($xpc->findnodes("//atom:test", $dom)) {
printf("%-10s: %s\n", $node->nodeName, $node->textContent);
}
}
exit 0;
i.e. you assign a local namespace prefix for those namespaces you are interested in.
Output:
$ perl dummy.pl
atom:test : One Ring to rule them all,
a:test : One Ring to find them,
test : The end...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With