Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse huge XML file with various elements in Go?

Tags:

go

How can you parse a huge XML file that's having various elements (i.e. not same element repeated multiple times).

Example:

<stuff>
    <header>...</header>
    <item>...</item>
    ...
    <item>...</item>
    <something>...</sometihng>
</stuff>

I want to write a script in Go that would allow me to split this file in multiple smaller files with specific amount of tags per file. All examples on how to parse XML with Go seems to rely on knowing the elements that you have in the file.

Can the file be parsed without knowing that? Something like for each element in XML no matter what element is there (header, item, something, etc...)

like image 947
daniels Avatar asked Apr 14 '16 13:04

daniels


1 Answers

Use the standard xml Decoder.

Call Token to read tokens one by one. When a start element of interest is found, call DecodeElement to decode the element to a Go value.

Here's a sketch of how to use the decoder:

d := xml.NewDecoder(r)
for {
    t, tokenErr := d.Token()
    if tokenErr != nil {
        if tokenErr == io.EOF {
           break
        }
        // handle error somehow
        return fmt.Errorf("decoding token: %v", err)
    }
    switch t := t.(type) {
    case xml.StartElement:
        if t.Name.Space == "foo" && t.Name.Local == "bar" {
            var b bar
            if err := d.DecodeElement(&b, &t); err != nil {
                // handle error somehow
                return fmt.Errorf("decoding element %q: %v", t.Name.Local, err)
            }
            // do something with b
        }
    }
}
like image 115
Bayta Darell Avatar answered Sep 25 '22 02:09

Bayta Darell