What is the quickest way to insert a pandas DataFrame into mongodb using PyMongo
?
Attempts
db.myCollection.insert(df.to_dict())
gave an error
InvalidDocument: documents must have only string keys, the key was Timestamp('2013-11-23 13:31:00', tz=None)
db.myCollection.insert(df.to_json())
gave an error
TypeError: 'str' object does not support item assignment
db.myCollection.insert({id: df.to_json()})
gave an error
InvalidDocument: documents must have only string a keys, key was <built-in function id>
df
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 150 entries, 2013-11-23 13:31:26 to 2013-11-23 13:24:07 Data columns (total 3 columns): amount 150 non-null values price 150 non-null values tid 150 non-null values dtypes: float64(2), int64(1)
Insert Into Collection To insert a record, or document as it is called in MongoDB, into a collection, we use the insert_one() method. The first parameter of the insert_one() method is a dictionary containing the name(s) and value(s) of each field in the document you want to insert.
The first step to connect python to Atlas is MongoDB cluster setup. Next, create a file named pymongo_test_insert.py in any folder to write pymongo code. You can use any simple text editor like Textpad/Notepad. Use the connection_string to create the mongoclient and get the MongoDB database connection.
What is PyMongo? PyMongo is MongoDB's official native driver for Python. It's a library that lets you connect to a MongoDB database and query the data stored using the MongoDB Query API. It is the recommended way to interface with the document database.
SSH Tunnel Connect SSH > LOOPBACK = "localhost(127.0. 0.1)" unable connect mongodb if restricted all external IP access. So connect via SSH and call python IDLE apply your connection commands and grab output(connect to localhost). Thank you for your help!
Here you have the very quickest way. Using the insert_many
method from pymongo 3 and 'records' parameter of to_dict
method.
db.collection.insert_many(df.to_dict('records'))
I doubt there is a both quickest and simple method. If you don't worry about data conversion, you can do
>>> import json >>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}}) >>> df A 1 2013-11-23 21:14:34.118531 >>> records = json.loads(df.T.to_json()).values() >>> db.myCollection.insert(records)
But in case you try to load data back, you'll get:
>>> df = read_mongo(db, 'myCollection') >>> df A 0 1385241274118531000 >>> df.dtypes A int64 dtype: object
so you'll have to convert 'A' columnt back to datetime
s, as well as all not int
, float
or str
fields in your DataFrame
. For this example:
>>> df['A'] = pd.to_datetime(df['A']) >>> df A 0 2013-11-23 21:14:34.118531
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With