I need to extract offers from an XML, but taking into consideration nodes order:
<items> <offer/> <product> <offer/> <offer/> </product> <offer/> <offer/> </items>
The following struct would decode the values, but into two different slices, which will cause loss of original order:
type Offers struct { Offers []offer `xml:"items>offer"` Products []offer `xml:"items>product>offer"` }
Any ideas?
One way would be to overwrite the UnmarshalXML
method. Let's say our input looks like this:
<doc>
<head>My Title</head>
<p>A first paragraph.</p>
<p>A second one.</p>
</doc>
We want to deserialize the document and preserve the order of the head and paragraphs. For order we will need a slice. To accommodate both head
and p
, we will need an interface. We could define our document like this:
type Document struct {
XMLName xml.Name `xml:"doc"`
Contents []Mixed `xml:",any"`
}
The ,any
annotation will collect any element into Contents
. It is a Mixed
type, which we need to define as a type:
type Mixed struct {
Type string // just keep "head" or "p" in here
Value interface{} // keep the value, we could use string here, too
}
We need more control over the deserialization process, so we turn Mixed
into an xml.Unmashaler
by implementing UnmarshalXML
. We decide on the code path based on the name of the start element, e.g. head
or p
. Here, we only populate our Mixed
struct with some values, but you can basically do anything here:
func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
switch start.Name.Local {
case "head", "p":
var e string
if err := d.DecodeElement(&e, &start); err != nil {
return err
}
m.Value = e
m.Type = start.Name.Local
default:
return fmt.Errorf("unknown element: %s", start)
}
return nil
}
Putting it all together, usage of the above structs could look like this:
func main() {
s := `
<doc>
<head>My Title</head>
<p>A first paragraph.</p>
<p>A second one.</p>
</doc>
`
var doc Document
if err := xml.Unmarshal([]byte(s), &doc); err != nil {
log.Fatal(err)
}
fmt.Printf("#%v", doc)
}
Which would print.
#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}
We preserved order and kept some type information. Instead of a single type, like Mixed
you could use many different types for the deserialization. The cost of this approach is that your container - here the Contents
field of the document - is an interface. To do anything element-specific, you'll need a type assertion or some helper method.
Complete code on play: https://play.golang.org/p/fzsUPPS7py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With