Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to import data from mongodb to pandas?

I have a large amount of data in a collection in mongodb which I need to analyze. How do i import that data to pandas?

I am new to pandas and numpy.

EDIT: The mongodb collection contains sensor values tagged with date and time. The sensor values are of float datatype.

Sample Data:

{ "_cls" : "SensorReport", "_id" : ObjectId("515a963b78f6a035d9fa531b"), "_types" : [     "SensorReport" ], "Readings" : [     {         "a" : 0.958069536790466,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:26:35.297Z"),         "b" : 6.296118156595,         "_cls" : "Reading"     },     {         "a" : 0.95574014778624,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:27:09.963Z"),         "b" : 6.29651468650064,         "_cls" : "Reading"     },     {         "a" : 0.953648289182713,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:27:37.545Z"),         "b" : 7.29679823731148,         "_cls" : "Reading"     },     {         "a" : 0.955931884300997,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:28:21.369Z"),         "b" : 6.29642922525632,         "_cls" : "Reading"     },     {         "a" : 0.95821381,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:41:20.801Z"),         "b" : 7.28956613,         "_cls" : "Reading"     },     {         "a" : 4.95821335,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:41:36.931Z"),         "b" : 6.28956574,         "_cls" : "Reading"     },     {         "a" : 9.95821341,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:42:09.971Z"),         "b" : 0.28956488,         "_cls" : "Reading"     },     {         "a" : 1.95667927,         "_types" : [             "Reading"         ],         "ReadingUpdatedDate" : ISODate("2013-04-02T08:43:55.463Z"),         "b" : 0.29115237,         "_cls" : "Reading"     } ], "latestReportTime" : ISODate("2013-04-02T08:43:55.463Z"), "sensorName" : "56847890-0", "reportCount" : 8 } 
like image 638
Nithin Avatar asked Apr 27 '13 07:04

Nithin


2 Answers

pymongo might give you a hand, followings are some codes I'm using:

import pandas as pd from pymongo import MongoClient   def _connect_mongo(host, port, username, password, db):     """ A util for making a connection to mongo """      if username and password:         mongo_uri = 'mongodb://%s:%s@%s:%s/%s' % (username, password, host, port, db)         conn = MongoClient(mongo_uri)     else:         conn = MongoClient(host, port)       return conn[db]   def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True):     """ Read from Mongo and Store into DataFrame """      # Connect to MongoDB     db = _connect_mongo(host=host, port=port, username=username, password=password, db=db)      # Make a query to the specific DB and Collection     cursor = db[collection].find(query)      # Expand the cursor and construct the DataFrame     df =  pd.DataFrame(list(cursor))      # Delete the _id     if no_id:         del df['_id']      return df 
like image 117
waitingkuo Avatar answered Oct 08 '22 06:10

waitingkuo


You can load your mongodb data to pandas DataFrame using this code. It works for me. Hopefully for you too.

import pymongo import pandas as pd from pymongo import MongoClient client = MongoClient() db = client.database_name collection = db.collection_name data = pd.DataFrame(list(collection.find())) 
like image 23
saimadhu.polamuri Avatar answered Oct 08 '22 08:10

saimadhu.polamuri