Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go json.Unmarshal key with \u0000 \x00

Here is the Go playground link.

Basically there are some special characters ('\u0000') in my JSON string key:

var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)

I want to Unmarshal it into a struct:

type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"\0*\0_errorMessages"`
    Msg4   interface{} `json:"\\0*\\0_errorMessages"`
    Msg5   interface{} `json:"\x00*\x00_errorMessages"`
    Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
    SMsg   interface{} `json:"*_successMessages"`
}

I tried a lot but it's not working. This link might help golang.org/src/encoding/json/encode_test.go.

like image 383
Lei Cao Avatar asked Sep 08 '15 08:09

Lei Cao


1 Answers

Short answer: With the current json implementation it is not possible using only struct tags.

Note: It's an implementation restriction, not a specification restriction. (It's the restriction of the json package implementation, not the restriction of the struct tags specification.)


Some background: you specified your tags with a raw string literal:

The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes...

So no unescaping or unquoting happens in the content of the raw string literal by the compiler.

The convention for struct tag values quoted from reflect.StructTag:

By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.

What this means is that by convention tag values are a list of (key:"value") pairs separated by spaces. There are quite a few restrictions for keys, but values may be anything, and values (should) use "Go string literal syntax", this means that these values will be unquoted at runtime from code (by a call to strconv.Unquote(), called from StructTag.Get(), in source file reflect/type.go, currently line #809).

So no need for double quoting. See your simplified example:

type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"\u0000_abc"`
}

Now the following code:

t := reflect.TypeOf(Response1{})
fmt.Printf("%#v\n", t.Field(2).Tag)
fmt.Printf("%#v\n", t.Field(2).Tag.Get("json"))

Prints:

"json:\"\\u0000_abc\""
"\x00_abc"

As you can see, the value part for the json key is "\x00_abc" so it properly contains the zero character.

But how will the json package use this?

The json package uses the value returned by StructTag.Get() (from the reflect package), exactly what we did. You can see it in the json/encode.go source file, typeFields() function, currently line #1032. So far so good.

Then it calls the unexported json.parseTag() function, in json/tags.go source file, currently line #17. This cuts the part after the comma (which becomes the "tag options").

And finally json.isValidTag() function is called with the previous value, in source file json/encode.go, currently line #731. This function checks the runes of the passed string, and (besides a set of pre-defined allowed characters "!#$%&()*+-./:<=>?@[]^_{|}~ ") rejects everything that is not a unicode letter or digit (as defined by unicode.IsLetter() and unicode.IsDigit()):

if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
    return false
} 

'\u0000' is not part of the pre-defined allowed characters, and as you can guess now, it is neither a letter nor a digit:

// Following code prints "INVALID":
c := '\u0000'
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
    fmt.Println("INVALID")
}

And since isValidTag() returns false, the name (which is the value for the json key, without the "tag options" part) will be discarded (name = "") and not used. So no match will be found for the struct field containing a unicode zero.

For an alternative solution use a map, or a custom json.Unmarshaler or use json.RawMessage.

But I would highly discourage using such ugly json keys. I understand likely you are just trying to parse such json response and it may be out of your reach, but you should fight against using these keys as they will just cause more problems later on (e.g. if stored in db, by inspecting records it will be very hard to spot that there are '\u0000' characters in them as they may be displayed as nothing).

like image 135
icza Avatar answered Sep 30 '22 05:09

icza