I was trying to perform xpath operations on a html document. I wanted to do a two-level xpath query. The html document "index.html" is as follows:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="head">
<div class="area">
<div class="value">10</div>
</div>
<div class="area">
<div class="value">20</div>
</div>
<div class="area">
<div class="value">30</div>
</div>
</div>
</body>
</html>
I wanted to get all divs with class="area" first, then recursively get divs inside it with class="value" in golang using Gokogiri.
My go code is as follows: package main
import (
"fmt"
"io/ioutil"
"github.com/moovweb/gokogiri"
"github.com/moovweb/gokogiri/xpath"
)
func main() {
content, _ := ioutil.ReadFile("index.html")
doc, _ := gokogiri.ParseHtml(content)
defer doc.Free()
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile("//div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
}
However, the output I get is odd:
10
20
30
10
20
30
10
20
30
I intend to get:
10
20
30
I want to recursively search for xpath patterns. I think there is something wrong with my second level xpath pattern. It appears, my second level xpath is again search in the whole document instead of individual divs with class="area". What do I do for recursive xpath patterns search? I'd appreciate any help.
An XPath search from any node can still search the entire tree.
If you want to search just the subtree, you can start the expression with a .
(assuming you still want descendant-or-self), otherwise use a exact path.
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile(".//div[@class='value']")
// this works in your example case
// xpw := xpath.Compile("div[@class='value']")
// as does this
// xpw := xpath.Compile("./div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
Prints:
10
20
30
Your second query, //div[@class='value']
, will select divs anywhere in the document regardless of the parent element. Instead, try div[@class='value']
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With