Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert a Pandas Dataframe into mongodb using PyMongo

What is the quickest way to insert a pandas DataFrame into mongodb using PyMongo?

Attempts

db.myCollection.insert(df.to_dict()) 

gave an error

InvalidDocument: documents must have only string keys, the key was Timestamp('2013-11-23 13:31:00', tz=None)


 db.myCollection.insert(df.to_json()) 

gave an error

TypeError: 'str' object does not support item assignment


 db.myCollection.insert({id: df.to_json()}) 

gave an error

InvalidDocument: documents must have only string a keys, key was <built-in function id>


df

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 150 entries, 2013-11-23 13:31:26 to 2013-11-23 13:24:07 Data columns (total 3 columns): amount    150  non-null values price     150  non-null values tid       150  non-null values dtypes: float64(2), int64(1) 
like image 917
Nyxynyx Avatar asked Nov 23 '13 20:11

Nyxynyx


People also ask

How do I transfer data from Python to MongoDB?

Insert Into Collection To insert a record, or document as it is called in MongoDB, into a collection, we use the insert_one() method. The first parameter of the insert_one() method is a dictionary containing the name(s) and value(s) of each field in the document you want to insert.

What is the command to connect to MongoDB from Python using PyMongo?

The first step to connect python to Atlas is MongoDB cluster setup. Next, create a file named pymongo_test_insert.py in any folder to write pymongo code. You can use any simple text editor like Textpad/Notepad. Use the connection_string to create the mongoclient and get the MongoDB database connection.

Is PyMongo the same as MongoDB?

What is PyMongo? PyMongo is MongoDB's official native driver for Python. It's a library that lets you connect to a MongoDB database and query the data stored using the MongoDB Query API. It is the recommended way to interface with the document database.

How does PyMongo remote connect to MongoDB?

SSH Tunnel Connect SSH > LOOPBACK = "localhost(127.0. 0.1)" unable connect mongodb if restricted all external IP access. So connect via SSH and call python IDLE apply your connection commands and grab output(connect to localhost). Thank you for your help!


2 Answers

Here you have the very quickest way. Using the insert_many method from pymongo 3 and 'records' parameter of to_dict method.

db.collection.insert_many(df.to_dict('records')) 
like image 146
dieguico Avatar answered Sep 19 '22 07:09

dieguico


I doubt there is a both quickest and simple method. If you don't worry about data conversion, you can do

>>> import json >>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}}) >>> df                            A 1 2013-11-23 21:14:34.118531  >>> records = json.loads(df.T.to_json()).values() >>> db.myCollection.insert(records) 

But in case you try to load data back, you'll get:

>>> df = read_mongo(db, 'myCollection') >>> df                      A 0  1385241274118531000 >>> df.dtypes A    int64 dtype: object 

so you'll have to convert 'A' columnt back to datetimes, as well as all not int, float or str fields in your DataFrame. For this example:

>>> df['A'] = pd.to_datetime(df['A']) >>> df                            A 0 2013-11-23 21:14:34.118531 
like image 42
alko Avatar answered Sep 19 '22 07:09

alko