Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to use html.Parse without it adding nodes to make a 'well-formed tree'?

Tags:

go

package main

import (
    "bytes"
    "code.google.com/p/go.net/html"
    "fmt"
    "log"
    "strings"
)

func main() {
    s := "Blah. <b>Blah.</b> Blah."
    n, err := html.Parse(strings.NewReader(s))
    if err != nil {
        log.Fatalf("Parse error: %s", err)
    }
    var buf bytes.Buffer
    if err := html.Render(&buf, n); err != nil {
        log.Fatalf("Render error: %s", err)
    }
    fmt.Println(buf.String())
}

Output:

<html><head></head><body>Blah. <b>Blah.</b> Blah.</body></html>

Is there a way to stop html.Parse from making a document out of fragments (ie avoid adding <html>, <body> etc.)? I'm aware of html.ParseFragment but it seems to exhibit the same behaviour.

You can get around it by wrapping the text to be parsed with a parent element such as <span> then doing something like the following:

n = n.FirstChild.LastChild.FirstChild

but that seems, well, kludgy to say the least.

Ideally I'd like to: accept input, manipulate or remove nodes found within it, and write the result back to a string, even if the result is an incomplete document.

like image 443
Rich Churcher Avatar asked Feb 26 '13 04:02

Rich Churcher


2 Answers

You need to provide a context to ParseFragment. The following program prints out the original text:

package main

import (
    "bytes"
    "code.google.com/p/go.net/html"
    "code.google.com/p/go.net/html/atom"
    "fmt"
    "log"
    "strings"
)

func main() {
    s := "Blah. <b>Blah.</b> Blah."
    n, err := html.ParseFragment(strings.NewReader(s), &html.Node{
        Type:     html.ElementNode,
        Data:     "body",
        DataAtom: atom.Body,
    })
    if err != nil {
        log.Fatalf("Parse error: %s", err)
    }
    var buf bytes.Buffer
    for _, node := range n {
        if err := html.Render(&buf, node); err != nil {
            log.Fatalf("Render error: %s", err)
        }
    }
    fmt.Println(buf.String())
}
like image 195
andybalholm Avatar answered Nov 19 '22 18:11

andybalholm


You want http://godoc.org/code.google.com/p/go.net/html#ParseFragment. Pass in a fake Body element as your context and the fragment will be returned as a slice of just the elements in your fragment.

You can see an example in the Partial* functions for go-html-transform's go.net/html wrapper package. https://code.google.com/p/go-html-transform/source/browse/h5/h5.go#32

like image 6
Jeremy Wall Avatar answered Nov 19 '22 19:11

Jeremy Wall