I have a listener which receives protobuf messages. However it doesn't know which type of message comes in when. So I tried to unmarshal into an interface{}
so I can later type cast:
var data interface{}
err := proto.Unmarshal(message, data)
if err != nil {
log.Fatal("unmarshaling error: ", err)
}
log.Printf("%v\n", data)
However this code doesn't compile:
cannot use data (type interface {}) as type proto.Message in argument to proto.Unmarshal:
interface {} does not implement proto.Message (missing ProtoMessage method)
How can I unmarshal and later type cast an "unknown" protobuf message in go?
First, two words about the OP's question, as presented by them:
proto.Unmarshal
can't unmarshal into an interface{}
. The method signature is obvious, you must pass a proto.Message
argument, which is an interface implemented by concrete protobuffer types.
When handling a raw protobuffer []byte
payload that didn't come in an Any
, ideally you have at least something (a string, a number, etc...) coming together with the byte slice, that you can use to map to the concrete protobuf message.
You can then switch on that and instantiate the appropriate protobuf concrete type, and only then pass that argument to Unmarshal
:
var message proto.Message
switch atLeastSomething {
case "foo":
message = &mypb.Foo{}
case "bar":
message = &mypb.Bar{}
}
_ = proto.Unmarshal(message, data)
As a foreword, consider that this should seldom happen in practice. The schema used to generate the protobuffer types in your language of choice represents a contract, and by accepting protobuffer payloads you are, for some definitions of it, fulfilling that contract.
Anyway, if for some reason you must deal with a completely unknown, mysterious, protobuffer payload in wire format, you can extract some information from it with the protowire
package.
Be aware that the wire representation of a protobuf message is ambiguous. A big source of uncertainty is the "length-delimited" type (2
) being used for strings, bytes, repeated fields and... sub-messages (reference).
You can retrieve the payload content, but you are bound to have weak semantics.
With that said, this is what a parser for unknown proto messages may look like. The idea is to leverage protowire.ConsumeField
to read through the original byte slice.
The data model could be like this:
type Field struct {
Tag Tag
Val Val
}
type Tag struct {
Num int32
Type protowire.Type
}
type Val struct {
Payload interface{}
Length int
}
And the parser:
func parseUnknown(b []byte) []Field {
fields := make([]Field, 0)
for len(b) > 0 {
n, t, fieldlen := protowire.ConsumeField(b)
if fieldlen < 1 {
return nil
}
field := Field{
Tag: Tag{Num: int32(n), Type: t },
}
_, _, taglen := protowire.ConsumeTag(b[:fieldlen])
if taglen < 1 {
return nil
}
var (
v interface{}
vlen int
)
switch t {
case protowire.VarintType:
v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])
case protowire.Fixed64Type:
v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])
case protowire.BytesType:
v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.StartGroupType:
v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.Fixed32Type:
v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
}
if vlen < 1 {
return nil
}
field.Val = Val{Payload: v, Length: vlen - taglen}
// fmt.Printf("%#v\n", field)
fields = append(fields, field)
b = b[fieldlen:]
}
return fields
}
Given a proto schema like:
message Foo {
string a = 1;
string b = 2;
Bar bar = 3;
}
message Bar {
string c = 1;
}
initialized in Go as:
&test.Foo{A: "A", B: "B", Bar: &test.Bar{C: "C"}}
And by adding a fmt.Printf("%#v\n", field)
statement at the end of the loop in the above code, it will output the following:
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}
About sub-messages
As you can see from the above the idea to deal with a protowire.BytesType
that may or may not be a message field is to attempt to parse it, recursively. If it succeeds, we keep the resulting msg
and store it in the field value, if it fails, we store the bytes as-is, which then may be a proto string
or bytes
. BTW, if I'm reading correctly, this seems what Marc Gravell does in the Protogen code.
About repeated fields
The code above doesn't deal with repeated fields explicitly, but after the parsing is done, repeated fields will have the same value for Field.Tag.Num
. From that, packing the fields into a slice/array should be trivial.
About maps
The code above also doesn't deal with proto maps. I suspect that maps are semantically equivalent to a repeated k/v pair, e.g.:
message Pair {
string key = 1; // or whatever key type
string val = 2; // or whatever val type
}
If my assumption is correct, then maps can be parsed with the given code as sub-messages.
About oneof
s
I haven't yet tested this, but I expect that information about the union type are completely lost. The byte payload will contain only the value that was actually set.
Any
?The Any
proto type doesn't fit in the picture. Contrary to what it may look like, Any
is not analogous to, say, map[string]interface{}
for JSON objects. And the reason is simple: Any
is a proto message with a very well defined structure, namely (in Go):
type Any struct {
// unexported fields
TypeUrl string // struct tags omitted
Value []byte // struct tags omitted
}
So it is more similar to the implementation of a Go interface{}
in that it holds some actual data and that data's type information.
It can hold itself arbitrary proto payloads (with their type information!) but it can not be used to decode unknown messages, because Any
has exactly those two fields, type url and a byte payload.
To wrap up, this answer doesn't provide a full-blown production-grade solution, but it shows how to decode arbitrary payloads while preserving as much original semantics as possible. Hopefully it will point you in the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With