Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongoimport csv files with string _id and upsert

I'm trying to use mongoimport to upsert data with string values in _id. Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.

Command I'm running:

mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert

Example data in mydata.csv:

{ "_id" : "0364", someField: "value" }

The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".

Does anyone know how to make it treat the _id as strings?

Things that don't work:

  • Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
  • Appending empty string to value: { "_id" : "0364" + "", someField: "value" }
like image 893
Paweł Krupiński Avatar asked Apr 24 '12 16:04

Paweł Krupiński


People also ask

How do I import a CSV file into MongoDB?

If you have CSV files (or TSV files - they're conceptually the same) to import, use the --type=csv or --type=tsv option to tell mongoimport what format to expect. Also important is to know whether your CSV file has a header row - where the first line doesn't contain data - instead it contains the name for each column.

Which parameter do you have to use while importing a CSV file into MongoDB using Mongoimport command?

CSV Files Without Column Headers In the previous example, we used the --headerline parameter to specify that the first line should be used for the field names. If your CSV file doesn't contain a header line, you'll need to use either the --fields parameter or the --fieldFile parameter to specify the field names.


1 Answers

Unfortunately there is not now a way to force number-like strings to be interpreted as strings:

https://jira.mongodb.org/browse/SERVER-3731

You could write a script in Python or some other language with which you're comfortable, along the lines of:

import csv, pymongo

connection = pymongo.Connection()
collection = connection.mydatabase.mycollection
reader = csv.DictReader(open('myfile.csv'))
for line in reader:
    print '_id', line['_id']
    upsert_fields = {
        '_id': line['_id'],
        'my_other_upsert_field': line['my_other_upsert_field']}

    collection.update(upsert_fields, line, upsert=True, safe=True)
like image 129
A. Jesse Jiryu Davis Avatar answered Oct 12 '22 09:10

A. Jesse Jiryu Davis