Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go XML error: invalid character entity

Tags:

go

Go can't parse a correct xml file with declared entities, keep getting this error:

error: XML syntax error on line 47: invalid character entity &n;

The line being <pos>&n;</pos> and the entity defined as <!ENTITY n "noun (common) (futsuumeishi)">

Here is the program in Go: http://play.golang.org/p/94_60srVne

like image 470
localhost Avatar asked Dec 24 '22 22:12

localhost


2 Answers

You can pass entities in if you create a Decoder and mess with its Entity map. I suspect the package doesn't actually parse DTDs, just from poking around xml.go; I see a comment saying it accumulates entities for the caller, but nothing that itself sets entries in d.Entity.

(It would be tricky for encoding/xml to safely provide that, even, because there is a built-in shared HTML entity map. Updating it for one doc would affect parsing of others.)

There's a little more paperwork to create a Decoder with custom entities than there is for regular xml.Unmarshal, but not too much:

func main() {
    jmd := JMdict{}

    d := xml.NewDecoder(bytes.NewReader([]byte(str)))
    d.Entity = map[string]string{
        "n": "(noun)",
    }
    err := d.Decode(&jmd)
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }
    fmt.Println(jmd)
}

Here's a Playground link with the Entity trick and some output code to show the object as JSON.

like image 119
twotwotwo Avatar answered Jan 08 '23 16:01

twotwotwo


The previous answer is the "right" answer, but I believe, depending on what you are really trying to accomplish, a "quick" answer is to disable Strict. e.g.:

d := xml.NewDecoder(os.Stdin)
d.Strict = false            
like image 29
tpa10 Avatar answered Jan 08 '23 16:01

tpa10