Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mixed XML decoding in golang preserving order

I need to extract offers from an XML, but taking into consideration nodes order:

<items>
  <offer/>
  <product>
    <offer/>
    <offer/>
  </product>
  <offer/>
  <offer/>
</items>

The following struct would decode the values, but into two different slices, which will cause loss of original order:

type Offers struct {
    Offers   []offer `xml:"items>offer"`
    Products []offer `xml:"items>product>offer"`
}

Any ideas?

like image 395
Miroslav Avatar asked Aug 24 '15 16:08

Miroslav


1 Answers

One way would be to overwrite the UnmarshalXML method. Let's say our input looks like this:

<doc>
    <head>My Title</head>
    <p>A first paragraph.</p>
    <p>A second one.</p>
</doc>

We want to deserialize the document and preserve the order of the head and paragraphs. For order we will need a slice. To accommodate both head and p, we will need an interface. We could define our document like this:

type Document struct {
    XMLName  xml.Name `xml:"doc"`
    Contents []Mixed  `xml:",any"`
}

The ,any annotation will collect any element into Contents. It is a Mixed type, which we need to define as a type:

type Mixed struct {
    Type  string      // just keep "head" or "p" in here
    Value interface{} // keep the value, we could use string here, too
}

We need more control over the deserialization process, so we turn Mixed into an xml.Unmashaler by implementing UnmarshalXML. We decide on the code path based on the name of the start element, e.g. head or p. Here, we only populate our Mixed struct with some values, but you can basically do anything here:

func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    switch start.Name.Local {
    case "head", "p":
        var e string
        if err := d.DecodeElement(&e, &start); err != nil {
            return err
        }
        m.Value = e
        m.Type = start.Name.Local
    default:
        return fmt.Errorf("unknown element: %s", start)
    }
    return nil
}

Putting it all together, usage of the above structs could look like this:

func main() {
    s := `
    <doc>
        <head>My Title</head>
        <p>A first paragraph.</p>
        <p>A second one.</p>
    </doc>
    `

    var doc Document
    if err := xml.Unmarshal([]byte(s), &doc); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("#%v", doc)
}   

Which would print.

#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}

We preserved order and kept some type information. Instead of a single type, like Mixed you could use many different types for the deserialization. The cost of this approach is that your container - here the Contents field of the document - is an interface. To do anything element-specific, you'll need a type assertion or some helper method.

Complete code on play: https://play.golang.org/p/fzsUPPS7py

like image 72
miku Avatar answered Nov 15 '22 06:11

miku