Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

html 4.0 entities in XPATH queries

Tags:

xpath

I don't know exactly why the xpath expression:

//h3[text()='Foo › Bar']

doesn't match:

<h3>Foo &rsaquo; Bar</h3>

Does that seem right? How do I query for that markup?

like image 516
Purrell Avatar asked Sep 03 '25 13:09

Purrell


1 Answers

XPath does not define any special escape sequences. When XPath is used within XSLT (e.g. in attributes of elements of an XSLT document), the escape sequences are processed by the XML processor that reads the stylesheet. If you use XPath in non-XML context (e.g. from Java or C# or other language) via a library, and your XPath query is a string literal in that language, you won't get any escape processing aside from that which the language itself usually does.

If this is C# or Java, this should work:

String xpath = "//h3[text()='Foo \u8250 Bar']";
...

As a side note, it wouldn't work in XSLT either, as XSLT uses XML, which doesn't define a character entity &rsaquo; - it only defines &lt;, &gt;, &quot;, &apos; and &amp;. You'd have to either use &#x8250;, or define the character entity yourself in DOCTYPE declaration of the XSLT stylesheet.

like image 112
Pavel Minaev Avatar answered Sep 05 '25 14:09

Pavel Minaev