I am trying to insert a dataframe into MongoDB. Each row should be one document.
from pymongo import MongoClient
import pandas as pd
client = MongoClient()
col = client['test']['test']
d = {'name': ['Braund', 'Cummings', 'Heikkinen', 'Allen'],
'age': [22,38,26,35],
'fare': [7.25, 71.83, 0 , 8.05],
'survived?': [False, True, True, False]}
df = pd.DataFrame(d)
col.insert_many(df)
However, the above code returns an error: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Changing col.insert_many(df)
to col.insert_many(df.to_dict())
or col.insert_many(df.to_json())
causes TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
PyMongo, the standard MongoDB driver library for Python, is easy to use and offers an intuitive API for accessing databases, collections, and documents. Objects retrieved from MongoDB through PyMongo are compatible with dictionaries and lists, so we can easily manipulate, iterate, and print them.
You can delete a table, or collection as it is called in MongoDB, by using the drop() method.
For my typical tables using MySQL, I usually use sqlalchemy to create a database object. However, with sqlalchemy we use the ORM (object-relational mapper) tool, which we cannot use with MongoDB. Mongo uses NoSQL, which would not work in the same way. Because of this, we will use a Python distribution called PyMongo.
You were very close, we can re-use your code.
Note, we still use pymongo.MongoClient
and pandas.DataFrame.to_dict
, with one param added in the latter.
to_dict(orient='records')
we would then have
from pymongo import MongoClient
import pandas as pd
client = MongoClient() # Remember your uri string
col = client['test']['test']
df = pd.DataFrame({'name': ['Braund','Cummings','Heikkinen','Allen'],
'age': [22,38,26,35],
'fare': [7.25, 71.83, 0 , 8.05],
'survived?': [False, True, True, False]})
data = df.to_dict(orient='records') # Here's our added param..
col.insert_many(data)
In short, by specifying orient='records'
the output is a list of dicts, the same format accepted by insert_many
. Also, as a bonus - to_dict
does better with datetime-type columns as compared to to_json
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With