How can I get H1,H2,H3 contents in one single xpath expression?
I know I could do this.
//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text()
and so on.
To break it down, remember: H1 = Main keywords and subject matter, what the overall post is about. H2 = Sections to break up content, using similar keywords to the H1 tag. H3 = Subcategories to further break up the content, making it easily scannable.
H1 is usally used for primary headers, h2 for subheaders, h3 for subsubheaders etc. It's doesn't really matter what order you use them in.
Use:
/html/body/*[self::h1 or self::h2 or self::h3]/text()
The following expression is incorrect:
//html/body/*[local-name() = "h1"
or local-name() = "h2"
or local-name() = "h3"]/text()
because it may select text nodes that are children of unwanted:h1
, different:h2
, someWeirdNamespace:h3
.
Another recommendation: Always avoid using //
when the structure of the XML document is statically known. Using //
most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With