Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get H1,H2,H3,... using a single xpath expression

Tags:

xpath

How can I get H1,H2,H3 contents in one single xpath expression?

I know I could do this.

//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text() 

and so on.

like image 294
Aivan Monceller Avatar asked Nov 03 '11 09:11

Aivan Monceller


People also ask

What is difference between H1 H2 H3?

To break it down, remember: H1 = Main keywords and subject matter, what the overall post is about. H2 = Sections to break up content, using similar keywords to the H1 tag. H3 = Subcategories to further break up the content, making it easily scannable.

Can I use H3 after H1?

H1 is usally used for primary headers, h2 for subheaders, h3 for subsubheaders etc. It's doesn't really matter what order you use them in.


1 Answers

Use:

/html/body/*[self::h1 or self::h2 or self::h3]/text()

The following expression is incorrect:

//html/body/*[local-name() = "h1"  
           or local-name() = "h2"  
           or local-name() = "h3"]/text()  

because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.

Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.

like image 133
Dimitre Novatchev Avatar answered Oct 22 '22 10:10

Dimitre Novatchev