Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch Import of json documents to Apache CouchDb

I have approximately 250,000 JSON-formatted files, each with one object in it (formatted just how CouchDB likes it with _id). What's the best way to import these into my remote CouchDB server as records?

-I am on a windows xp machine.

-I have internet access but I can't set up a couchDB server on my local machine and have it be WWW accessible (firewall constraints.) so no easy replication.

like image 463
Nate Avatar asked Jul 16 '10 19:07

Nate


People also ask

How do I update files in CouchDB?

You can also update/ change/ edit your document once you created. Click on the edit option (encircled in red). After clicking, you will get a new page where you can edit your entries. After editing click on the save changes tab and your document will be updated.

Is CouchDB document oriented?

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang. CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.

Which is a CouchDB API?

The CouchDB database has a REST API which allows you to work with the database's JSON documents. With this API, you can create your own requests right in ReadyAPI to work with JSON documents inside the database and get the necessary data from the CouchDB server.


1 Answers

I would highly suggest that you look into the bulk doc API in the couchdb wiki: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

Basically, you make a POST request to /someDatabase/_bulk_docs that looks like this:

{
  "docs": [
    { "_id": "awsdflasdfsadf", "foo": "bar" },
    { "_id": "cczsasdfwuhfas", "bwah": "there" },
    ...
  ]
}

Just like any other POST request, if you don't include _id properties, couchdb will generate them for you.

You can use this same operation to update a bunch of docs: just include their _rev property. And if you want to delete any of the docs that you are updating, then add a "_deleted": true property to the document.

If you have a json file with your documents and use curl, it could look like:

curl -H "Content-Type: application/json" --data-binary @/home/xxx/data.json https://usr:pwd@host:5984/someDatabase/_bulk_docs/

Cheers.

like image 92
Sam Bisbee Avatar answered Sep 20 '22 15:09

Sam Bisbee