Import csv data as array in mongodb using mongoimport

Tags:

mongoimport

I have been trying to import the csv data into mongodb using mongoimport. The collection is like this:

{
id:"122234343",
name: "name1",
children: ["222334444","333344444"]
}

One approach I tried is to create 2 csv files - one with id & name and other with id, children (if id has two children then it will have two rows). Import the data into two different collections using mongoimport and then use update the collection data using foreach() at second collection having children data.

Please suggest is there any another way to populate this "children" array directly from CSV??

429

asked Jun 19 '17 04:06

tarun jaggi

1 Answers

To me, the simplest way to work out how to format your "CSV" for mongoimport is to simply create a collection then use mongoexport on it to see what the CSV format should look like.

So create your document from the shell:

db.newcol.insert({
  id:"122234343",
  name: "name1",
  children: ["222334444","333344444"]
})

Then exit the shell and run mongoexport:

 mongoexport -d test -c testcol --fields id,name,children --type csv > out.csv

Which will show you the output as:

id,name,children
122234343,name1,"[""222334444"",""333344444""]"

Where the "array" is represented with a "string" and using the quotes "" in their escaped form.

That now is a pretty clear place to use mongoimport from, so just "import" now to test:

mongoimport -d test -c newcol --headerline --type csv out.csv

Re-enter the shell and see the document(s) in the new collection:

db.newcol.findOne()
{
        "_id" : ObjectId("59476924d6eb0b7d6ac37e02"),
        "id" : 122234343,
        "name" : "name1",
        "children" : "[\"222334444\",\"333344444\"]"
}

So everything is there, BUT the children are listed as a "string" rather than an array. But this is not really a problem, since we got the data imported and now it's just up us to now actually transform it:

var ops = [];
db.testcol.find({ "children": { "$type": 2} }).forEach(doc => {
  var children = doc.children.split(',').map( e => e.replace(/"|\[|\]|\\/gm,'').toString() );
  ops.push({
    "updateOne": {
      "filter": { "_id": doc._id },
      "update": { "$set": { "children": children } }
    }
  });

  if ( ops.length >= 1000 ) {
    db.newcol.bulkWrite(ops);
    ops = [];
  }             
});

if ( ops.length > 0 ) {
  db.newcol.bulkWrite(ops);
  ops = [];
}

So that is going to iterate anything that was imported into the collection that has a BSON type of 2 which is "string" via the $type query operator.

Then we take the string, split it as an array and strip the other characters to only leave the value you want.

Using .bulkWrite() you commit those updates in an efficient way, rather than writing every single document per request. They are actually sent in batches of 1000 to the server.

The end result is the document in the original wanted form:

db.testcol.findOne()
{
        "_id" : ObjectId("5947652ccb237bd6e4e902a5"),
        "id" : "122234343",
        "name" : "name1",
        "children" : [
                "222334444",
                "333344444"
        ]
}

So that is my "step by step" of how you can work out your CSV format, import it and then "transform" the data into the state that you need it.

105

answered Nov 29 '22 16:11

Neil Lunn

Related questions
                            
                                List all Mongoid models in Rails console
                            
                                MonogoDB Replica Set Status Not changing from Startup to Secondary
                            
                                I am trying to install MongoDB and I get this error "Data directory C:\data\db\ not found., terminating"
                            
                                loopback mongodb Authentication failed
                            
                                Spring Mongodb pagination of nested collection field
                            
                                Building MongoDB C Driver in Windows
                            
                                Mongoose Promise with bluebird and typescript
                            
                                Mongoose never connects to mongodb
                            
                                How to get all subfields of mongodb in a query when one field is root field of other field requested?
                            
                                Converting mongo array to object with key-value pair
                            
                                Finding distinct from collections in mongodb
                            
                                Mongoose TypeError: User is not a constructor
                            
                                How to resort the collection in MongoDB?
                            
                                $elemMatch query in MongoDB
                            
                                Whats the proper way to ensureIndex index mongodb field on a multidimensional array?
                            
                                TypeScript how to format array of object to array of json
                            
                                Deserialize MongoDB date fields to Java POJO Using Jackson
                            
                                Convert ObjectId to String and vice versa inside MongoDB aggregate map
                            
                                How to upload a json file in mongodb using Java?
                            
                                Search by regex on field name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With