Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - insert a dataframe to MongoDB

I am trying to insert a dataframe into MongoDB. Each row should be one document.

from pymongo import MongoClient
import pandas as pd

client = MongoClient()
col = client['test']['test']

d = {'name': ['Braund', 'Cummings', 'Heikkinen', 'Allen'],
     'age': [22,38,26,35],
     'fare': [7.25, 71.83, 0 , 8.05],
     'survived?': [False, True, True, False]}

df = pd.DataFrame(d)

col.insert_many(df)

However, the above code returns an error: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Changing col.insert_many(df) to col.insert_many(df.to_dict()) or col.insert_many(df.to_json()) causes TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping

like image 903
Munichong Avatar asked Mar 11 '18 15:03

Munichong


People also ask

Can Python connect to MongoDB?

PyMongo, the standard MongoDB driver library for Python, is easy to use and offers an intuitive API for accessing databases, collections, and documents. Objects retrieved from MongoDB through PyMongo are compatible with dictionaries and lists, so we can easily manipulate, iterate, and print them.

How do I drop a collection in MongoDB Python?

You can delete a table, or collection as it is called in MongoDB, by using the drop() method.

Does Sqlalchemy work with MongoDB?

For my typical tables using MySQL, I usually use sqlalchemy to create a database object. However, with sqlalchemy we use the ORM (object-relational mapper) tool, which we cannot use with MongoDB. Mongo uses NoSQL, which would not work in the same way. Because of this, we will use a Python distribution called PyMongo.


1 Answers

You were very close, we can re-use your code.

Note, we still use pymongo.MongoClient and pandas.DataFrame.to_dict, with one param added in the latter. to_dict(orient='records')
we would then have

from pymongo import MongoClient
import pandas as pd

client = MongoClient()  # Remember your uri string
col = client['test']['test']

df = pd.DataFrame({'name': ['Braund','Cummings','Heikkinen','Allen'],
                   'age': [22,38,26,35],
                   'fare': [7.25, 71.83, 0 , 8.05],
                   'survived?': [False, True, True, False]})

data = df.to_dict(orient='records')  # Here's our added param..

col.insert_many(data)

In short, by specifying orient='records' the output is a list of dicts, the same format accepted by insert_many. Also, as a bonus - to_dict does better with datetime-type columns as compared to to_json!

like image 72
Friedrich Avatar answered Sep 19 '22 09:09

Friedrich