Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

protobuf unmarshal unknown message

I have a listener which receives protobuf messages. However it doesn't know which type of message comes in when. So I tried to unmarshal into an interface{} so I can later type cast:

var data interface{}
err := proto.Unmarshal(message, data)
if err != nil {
  log.Fatal("unmarshaling error: ", err)
}
log.Printf("%v\n", data)

However this code doesn't compile:

cannot use data (type interface {}) as type proto.Message in argument to proto.Unmarshal:
  interface {} does not implement proto.Message (missing ProtoMessage method)

How can I unmarshal and later type cast an "unknown" protobuf message in go?

like image 862
gucki Avatar asked Oct 29 '22 14:10

gucki


1 Answers

First, two words about the OP's question, as presented by them:

proto.Unmarshal can't unmarshal into an interface{}. The method signature is obvious, you must pass a proto.Message argument, which is an interface implemented by concrete protobuffer types.

When handling a raw protobuffer []byte payload that didn't come in an Any, ideally you have at least something (a string, a number, etc...) coming together with the byte slice, that you can use to map to the concrete protobuf message.

You can then switch on that and instantiate the appropriate protobuf concrete type, and only then pass that argument to Unmarshal:

var message proto.Message
switch atLeastSomething {
    case "foo":
        message = &mypb.Foo{}
    case "bar":
        message = &mypb.Bar{}
}
_ = proto.Unmarshal(message, data)

Now, what if the byte payload is truly unknown?

As a foreword, consider that this should seldom happen in practice. The schema used to generate the protobuffer types in your language of choice represents a contract, and by accepting protobuffer payloads you are, for some definitions of it, fulfilling that contract.

Anyway, if for some reason you must deal with a completely unknown, mysterious, protobuffer payload in wire format, you can extract some information from it with the protowire package.

Be aware that the wire representation of a protobuf message is ambiguous. A big source of uncertainty is the "length-delimited" type (2) being used for strings, bytes, repeated fields and... sub-messages (reference).

You can retrieve the payload content, but you are bound to have weak semantics.

The code

With that said, this is what a parser for unknown proto messages may look like. The idea is to leverage protowire.ConsumeField to read through the original byte slice.

The data model could be like this:

type Field struct {
    Tag Tag
    Val Val
}

type Tag struct {
    Num int32
    Type protowire.Type
}

type Val struct {
    Payload interface{}
    Length int
}

And the parser:

func parseUnknown(b []byte) []Field {
    fields := make([]Field, 0)
    for len(b) > 0 {
        n, t, fieldlen := protowire.ConsumeField(b)
        if fieldlen < 1 {
            return nil
        }
        field := Field{
            Tag: Tag{Num: int32(n), Type: t },
        }

        _, _, taglen := protowire.ConsumeTag(b[:fieldlen])
        if taglen < 1 {
            return nil
        }

        var (
            v interface{}
            vlen int
        )
        switch t {
        case protowire.VarintType:
            v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])

        case protowire.Fixed64Type:
            v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])

        case protowire.BytesType:
            v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
            sub := parseUnknown(v.([]byte))
            if sub != nil {
                v = sub
            }

        case protowire.StartGroupType:
            v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
            sub := parseUnknown(v.([]byte))
            if sub != nil {
                v = sub
            }

        case protowire.Fixed32Type:
            v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
        }

        if vlen < 1 {
            return nil
        }

        field.Val = Val{Payload: v, Length: vlen - taglen}
        // fmt.Printf("%#v\n", field)

        fields = append(fields, field)
        b = b[fieldlen:]
    }
    return fields
}

Sample input and output

Given a proto schema like:

message Foo {
  string a = 1;
  string b = 2;
  Bar bar = 3;
}

message Bar {
  string c = 1;
}

initialized in Go as:

&test.Foo{A: "A", B: "B", Bar: &test.Bar{C: "C"}}

And by adding a fmt.Printf("%#v\n", field) statement at the end of the loop in the above code, it will output the following:

main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}

About sub-messages

As you can see from the above the idea to deal with a protowire.BytesType that may or may not be a message field is to attempt to parse it, recursively. If it succeeds, we keep the resulting msg and store it in the field value, if it fails, we store the bytes as-is, which then may be a proto string or bytes. BTW, if I'm reading correctly, this seems what Marc Gravell does in the Protogen code.

About repeated fields

The code above doesn't deal with repeated fields explicitly, but after the parsing is done, repeated fields will have the same value for Field.Tag.Num. From that, packing the fields into a slice/array should be trivial.

About maps

The code above also doesn't deal with proto maps. I suspect that maps are semantically equivalent to a repeated k/v pair, e.g.:

message Pair {
    string key = 1; // or whatever key type
    string val = 2; // or whatever val type
}

If my assumption is correct, then maps can be parsed with the given code as sub-messages.

About oneofs

I haven't yet tested this, but I expect that information about the union type are completely lost. The byte payload will contain only the value that was actually set.

But what about Any?

The Any proto type doesn't fit in the picture. Contrary to what it may look like, Any is not analogous to, say, map[string]interface{} for JSON objects. And the reason is simple: Any is a proto message with a very well defined structure, namely (in Go):

type Any struct {
    // unexported fields
    TypeUrl string // struct tags omitted
    Value []byte   // struct tags omitted
}

So it is more similar to the implementation of a Go interface{} in that it holds some actual data and that data's type information.

It can hold itself arbitrary proto payloads (with their type information!) but it can not be used to decode unknown messages, because Any has exactly those two fields, type url and a byte payload.


To wrap up, this answer doesn't provide a full-blown production-grade solution, but it shows how to decode arbitrary payloads while preserving as much original semantics as possible. Hopefully it will point you in the right direction.

like image 65
blackgreen Avatar answered Nov 09 '22 14:11

blackgreen