Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decompress a []byte content in gzip format that gives an error when unmarshaling

Tags:

utf-8

go

I'm making a request to an API, which with I get a []byte out of the response (ioutil.ReadAll(resp.Body)). I'm trying to unmarshal this content, but seems to be not encoded on utf-8 format, as unmarshal returns an error. I'm trying this to do so:

package main

import (
    "encoding/json"
    "fmt"

    "some/api"
)

func main() {
    content := api.SomeAPI.SomeRequest() // []byte variable
    var data interface{}
    err := json.Unmarshal(content, &data)
    if err != nil {
        panic(err.Error())
    }
    fmt.Println("Data from response", data)
}

I get as an error that invalid character '\x1f' looking for beginning of value. For the record, the response includes in the header that Content-Type:[application/json; charset=utf-8].

How can I decode content to avoid these invalid characters when unmarshaling?

Edit

This is the hexdump of content: play.golang.org/p/oJ5mqERAmj

like image 531
Fernando Á. Avatar asked Oct 07 '13 15:10

Fernando Á.


1 Answers

Judging by your hex dump you are receiving gzip encoded data so you'll need to use compress/gzip to decode it first.

Try something like this

package main

import (
    "bytes"
    "compress/gzip"
    "encoding/json"
    "fmt"
    "io"
    "some/api"
)

func main() {
    content := api.SomeAPI.SomeRequest() // []byte variable

    // decompress the content into an io.Reader
    buf := bytes.NewBuffer(content)
    reader, err := gzip.NewReader(buf)
    if err != nil {
        panic(err)
    }

    // Use the stream interface to decode json from the io.Reader
    var data interface{}
    dec := json.NewDecoder(reader)
    err = dec.Decode(&data)
    if err != nil && err != io.EOF {
        panic(err)
    }
    fmt.Println("Data from response", data)
}

Previous

Character \x1f is the unit separator character in ASCII and UTF-8. It is never part of an UTF-8 encoding, however can be used to mark off different bits of text. A string with an \x1f can valid UTF-8 but not valid json as far as I know.

I think you need to read the API specification closely to find out what they are using the \x1f markers for, but in the meantime you could try removing them and see what happens, eg

import (
    "bytes"
    "fmt"
)

func main() {
    b := []byte("hello\x1fGoodbye")
    fmt.Printf("b was %q\n", b)
    b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
    fmt.Printf("b is now %q\n", b)
}

Prints

b was "hello\x1fGoodbye"
b is now "hello Goodbye"

Playground link

like image 136
Nick Craig-Wood Avatar answered Oct 11 '22 02:10

Nick Craig-Wood