I want to make a partial dump of a Mongodb database (partial as in, I need to filter some documents and some fields). Then, this dump will be imported onto another server.
I cannot use the mongodump utility, as it doesn't allow filtering fields.
I could use the mongoexport utility, as it allows filtering both documents and fields. Though, the documentation states that mongoexport can only output a JSON file and:
does not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON.
It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.
mongodump
The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.
For example, given a simple document in a collection:
db.test.insert({ "a": 1, "b": 2, "c": 3 })
You can create the View on that collection, with just wanted fields:
db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])
Then exiting the mongo shell you can access the View from mongodump
using the --viewsAsCollections
option:
mongodump --db test --collection testView --viewsAsCollections
This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections
means that instead of mongodump
just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.
The resulting BSON content can then be loaded via mongorestore
:
mongorestore --db other --collection test
Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name
use other
db.test.find()
{ "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }
Noting also that as a View, the aggregation pipeline can really be anything, so $match
statements can filter and you can transform or even actually "aggregate" however you want.
--fields
with mongoexport
In much the same way, the mongoexport
utility can also access the content from a View.
Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.
So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.
For example:
db.mixed.insert({
"a": NumberLong(1),
"b": NumberDecimal("123.45"),
"c": new Date(),
"d": "unwanted"
})
Which would appear in the mongo
shell as:
{
"_id" : ObjectId("5bfb428790b2b4e4241a015c"),
"a" : NumberLong(1),
"b" : NumberDecimal("123.45"),
"c" : ISODate("2018-11-26T00:47:03.033Z"),
"d" : "unwanted"
}
You can still set up a View:
db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])
And the export will just pick up the data:
mongoexport --db test --collection mixedView > out.json
{
"_id": {
"$oid": "5bfb428790b2b4e4241a015c"
},
"a": {
"$numberLong": "1"
},
"b": {
"$numberDecimal": "123.45"
},
"c": {
"$date": "2018-11-26T00:47:03.033Z"
}
}
Or the same thing on the original collection, just using --fields
for selection:
mongoexport --db test --collection mixed --fields a,b,c > out.json
With exactly the same output. Being that the only restriction is the --query
can only support a regular query expression as given to find()
or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.
The Extended JSON format is recognized by mongoimport
and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:
mongoimport --db other --collection mixed out.json
And then viewing the data:
use other
db.mixed.findOne()
{
"_id" : ObjectId("5bfb428790b2b4e4241a015c"),
"a" : NumberLong(1),
"b" : NumberDecimal("123.45"),
"c" : ISODate("2018-11-26T00:47:03.033Z")
}
So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.
Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.
As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport
and mongoimport
are indeed compliant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With