Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang Gokogiri recursive xpath anomaly

I was trying to perform xpath operations on a html document. I wanted to do a two-level xpath query. The html document "index.html" is as follows:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
    <div class="head">
        <div class="area">
            <div class="value">10</div>
        </div>
        <div class="area">
            <div class="value">20</div>
        </div>
        <div class="area">
            <div class="value">30</div>
        </div>
    </div>
</body>
</html>

I wanted to get all divs with class="area" first, then recursively get divs inside it with class="value" in golang using Gokogiri.

My go code is as follows: package main

import (
    "fmt"
    "io/ioutil"

    "github.com/moovweb/gokogiri"
    "github.com/moovweb/gokogiri/xpath"
)

func main() {
    content, _ := ioutil.ReadFile("index.html")

    doc, _ := gokogiri.ParseHtml(content)
    defer doc.Free()

    xps := xpath.Compile("//div[@class='head']/div[@class='area']")
    xpw := xpath.Compile("//div[@class='value']")
    ss, _ := doc.Root().Search(xps)
    for _, s := range ss {
        ww, _ := s.Search(xpw)
        for _, w := range ww {
            fmt.Println(w.InnerHtml())
        }
    }
}

However, the output I get is odd:

10
20
30
10
20
30
10
20
30

I intend to get:

10
20
30

I want to recursively search for xpath patterns. I think there is something wrong with my second level xpath pattern. It appears, my second level xpath is again search in the whole document instead of individual divs with class="area". What do I do for recursive xpath patterns search? I'd appreciate any help.

like image 465
ArunL Avatar asked Feb 12 '23 05:02

ArunL


2 Answers

An XPath search from any node can still search the entire tree.

If you want to search just the subtree, you can start the expression with a . (assuming you still want descendant-or-self), otherwise use a exact path.

xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile(".//div[@class='value']")

// this works in your example case
// xpw := xpath.Compile("div[@class='value']")
// as does this
// xpw := xpath.Compile("./div[@class='value']")

ss, _ := doc.Root().Search(xps)
for _, s := range ss {
    ww, _ := s.Search(xpw)
    for _, w := range ww {
        fmt.Println(w.InnerHtml())
    }
}

Prints:

10
20
30
like image 191
JimB Avatar answered Feb 25 '23 13:02

JimB


Your second query, //div[@class='value'], will select divs anywhere in the document regardless of the parent element. Instead, try div[@class='value'].

like image 32
LarsH Avatar answered Feb 25 '23 13:02

LarsH