Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: select nodes with explicit 'xmlns' attribute

Tags:

xpath

Could anyone please provide XPath expression which selects all nodes that have explicit 'xmlns' attribute, e.g. <html xmlns="http://www.w3.org/1999/xhtml">? //*[@xmlns] does not work because (as it turned out) xmlns is not treated as attribute by XPath.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<title>Информация по счетам, картам</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta http-equiv="cache-control" content="no-cache"/>
<meta http-equiv="pragma" content="no-cache"/>
.......

I need only 'html' node here.

like image 953
Denis Avatar asked Jan 27 '12 16:01

Denis


2 Answers

This should not be possible, because

<a xmlns="http://www.org/1"> <b/> </a>

is equivalent to

<a xmlns="http://www.org/1"> <b xmlns="http://www.org/1"/> </a>
like image 32
choroba Avatar answered Nov 12 '22 03:11

choroba


The technically correct answer is that it's...

Not possible. You need to distinguish between the abstract document that the source text represents and the actual source text itself. XPath operates on the abstraction, not on the source text, and the location of the xmlns pseudo-attribute is only relevant in the latter.

However...

You could sort of fake it with the following XPath 2.0 expression:

//*[not(namespace-uri()=ancestor::*/namespace-uri())]

This selects any element that does not have an ancestor in the same namespace, which theoretically means that it selects all elements where the namespace is declared. However, it won't catch namespaces that are re-declared. For example, consider this document:

<html xmlns="http://www.w3.org/1999/xhtml">
    <head/>
    <body>
        <p xmlns="http://something">
            <p xmlns="http://something"/>
        </p>
    </body>
</html>

The expression above selects the html element and the first p. The second p has an ancestor in the same namespace, so it's not selected, even though it specifies an xmlns.

like image 124
Wayne Avatar answered Nov 12 '22 04:11

Wayne