Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read packed binary data in Go?

Tags:

go

binaryfiles

I'm trying to figure out the best way to read a packed binary file in Go that was produced by Python like the following:

import struct
f = open('tst.bin', 'wb')
fmt = 'iih' #please note this is packed binary: 4byte int, 4byte int, 2byte int
f.write(struct.pack(fmt,4, 185765, 1020))
f.write(struct.pack(fmt,4, 185765, 1022))
f.close()

I have been tinkering with some of the examples I've seen on Github.com and a few other sources but I can't seem to get anything working correctly (update shows working method). What is the idiomatic way to do this sort of thing in Go? This is one of several attempts

UPDATE and WORKING

package main

    import (
            "fmt"
            "os"
            "encoding/binary"
            "io"
            )

    func main() {
            fp, err := os.Open("tst.bin")

            if err != nil {
                    panic(err)
            }

            defer fp.Close()

            lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line

            for true {
                _, err := fp.Read(lineBuf)

                if err == io.EOF{
                    break
                }

                aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
                bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
                cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
                fmt.Println(aVal, bVal, cVal)
            }
    }
like image 240
Chris Townsend Avatar asked Dec 03 '15 23:12

Chris Townsend


People also ask

What is packed binary format?

In data structures, packed binary data usually means that more (if not all available) bit combinations are used to encode some values, while unpacked means that some bit combinations remain unused, either to improve readability or to make certain calculations easier (but unpacked data takes more space).

What is byte in Golang?

The byte type in Golang is an alias for the unsigned integer 8 type ( uint8 ). The byte type is only used to semantically distinguish between an unsigned integer 8 and a byte. The range of a byte is 0 to 255 (same as uint8 ).

What is binary read in Python?

You can read the particular number of bytes or the full content of the binary file at a time. Create a python file with the following script. The open() function has used to open the string. bin for reading. The read() function has been used to read 7 characters from the file in each iteration of while loop and print.


2 Answers

A well portable and rather easy way to handle the problem are Google's "Protocol Buffers". Though this is too late now since you got it working, I took some effort in explaining and coding it, so I am posting it anyway.

You can find the code on https://github.com/mwmahlberg/ProtoBufDemo

You need to install the protocol buffers for python using your preferred method (pip, OS package management, source) and for Go

The .proto file

The .proto file is rather simple for our example. I called it data.proto

syntax = "proto2";
package main;

message Demo {
  required uint32  A = 1;
  required uint32 B = 2;

  // A shortcomning: no 16 bit ints
  // We need to make this sure in the applications
  required uint32 C = 3;
}

Now you need to call protoc on the file and have it provide the code for both Python and Go:

protoc --go_out=. --python_out=. data.proto

which generates the files data_pb2.py and data.pb.go. Those files provide the language specific access to the protocol buffer data.

When using the code from github, all you need to do is to issue

go generate

in the source directory.

The Python code

import data_pb2

def main():

    # We create an instance of the message type "Demo"...
    data = data_pb2.Demo()

    # ...and fill it with data
    data.A = long(5)
    data.B = long(5)
    data.C = long(2015)


    print "* Python writing to file"
    f = open('tst.bin', 'wb')

    # Note that "data.SerializeToString()" counterintuitively
    # writes binary data
    f.write(data.SerializeToString())
    f.close()

    f = open('tst.bin', 'rb')
    read = data_pb2.Demo()
    read.ParseFromString(f.read())
    f.close()

    print "* Python reading from file"
    print "\tDemo.A: %d, Demo.B: %d, Demo.C: %d" %(read.A, read.B, read.C)

if __name__ == '__main__':
    main()

We import the file generated by protoc and use it. Not much magic here.

The Go File

package main

//go:generate protoc --python_out=. data.proto
//go:generate protoc --go_out=. data.proto
import (
    "fmt"
    "os"

    "github.com/golang/protobuf/proto"
)

func main() {

    // Note that we do not handle any errors for the sake of brevity
    d := Demo{}
    f, _ := os.Open("tst.bin")
    fi, _ := f.Stat()

    // We create a buffer which is big enough to hold the entire message
    b := make([]byte,fi.Size())

    f.Read(b)

    proto.Unmarshal(b, &d)
    fmt.Println("* Go reading from file")

    // Note the explicit pointer dereference, as the fields are pointers to a pointers
    fmt.Printf("\tDemo.A: %d, Demo.B: %d, Demo.C: %d\n",*d.A,*d.B,*d.C)
}

Note that we do not need to explicitly import, as the package of data.proto is main.

The result

After generation the required files and compiling the source, when you issue

$ python writer.py && ./ProtoBufDemo

the result is

* Python writing to file
* Python reading from file
    Demo.A: 5, Demo.B: 5, Demo.C: 2015
* Go reading from file
    Demo.A: 5, Demo.B: 5, Demo.C: 2015

Note that the Makefile in the repository offers a shorcut for generating the code, compiling the .go files and run both programs:

make run
like image 101
Markus W Mahlberg Avatar answered Oct 16 '22 09:10

Markus W Mahlberg


The Python format string is iih, meaning two 32-bit signed integers and one 16-bit signed integer (see the docs). You can simply use your first example but change the struct to:

type binData struct {
    A int32
    B int32
    C int16
}

func main() {
        fp, err := os.Open("tst.bin")

        if err != nil {
                panic(err)
        }

        defer fp.Close()

        for {
            thing := binData{}
            err := binary.Read(fp, binary.LittleEndian, &thing)

            if err == io.EOF{
                break
            }

            fmt.Println(thing.A, thing.B, thing.C)
        }
}

Note that the Python packing didn't specify the endianness explicitly, but if you're sure the system that ran it generated little-endian binary, this should work.

Edit: Added main() function to explain what I mean.

Edit 2: Capitalized struct fields so binary.Read could write into them.

like image 31
mjois Avatar answered Oct 16 '22 09:10

mjois