Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read mongodump output with go and mgo

Tags:

mongodb

go

mgo

I'm trying to read a collection dump generated by mongodump. The file is a few gigabytes so I want to read it incrementally.

I can read the first object with something like this:

buf := make([]byte, 100000)
f, _ := os.Open(path)
f.Read(buf)

var m bson.M
bson.Unmarshal(buf, &m)

However I don't know how much of the buf was consumed, so I don't know how to read the next one.

Is this possible with mgo?

like image 502
Nick Keets Avatar asked Jun 13 '14 04:06

Nick Keets


2 Answers

Using mgo's bson.Unmarshal() alone is not enough -- that function is designed to take a []byte representing a single document, and unmarshal it into a value.

You will need a function that can read the next whole document from the dump file, then you can pass the result to bson.Unmarshal().

Comparing this to encoding/json or encoding/gob, it would be convenient if mgo.bson had a Reader type that consumed documents from an io.Reader.

Anyway, from the source for mongodump, it looks like the dump file is just a series of bson documents, with no file header/footer or explicit record separators.

BSONTool::processFile shows how mongorestore reads the dump file. Their code reads 4 bytes to determine the length of the document, then uses that size to read the rest of the document. Confirmed that the size prefix is part of the bson spec.

Here is a playground example that shows how this could be done in Go: read the length field, read the rest of the document, unmarshal, repeat.

like image 170
lnmx Avatar answered Oct 05 '22 09:10

lnmx


The method File.Read returns the number of bytes read.

File.Read

Read reads up to len(b) bytes from the File. It returns the number of bytes read and an error, if any. EOF is signaled by a zero count with err set to io.EOF.

So you can get the number of bytes read by simply storing the return parameters of you read:

n, err := f.Read(buf)
like image 20
Elwinar Avatar answered Oct 05 '22 08:10

Elwinar