A better way to load MongoDB data to a DataFrame using Pandas and PyMongo?

Tags:

I have a 0.7 GB MongoDB database containing tweets that I'm trying to load into a dataframe. However, I get an error.

MemoryError:

My code looks like this:

cursor = tweets.find() #Where tweets is my collection
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

I've tried the methods in the following answers, which at some point create a list of all the elements of the database before loading it.

https://stackoverflow.com/a/17805626/2297475
https://stackoverflow.com/a/16255680/2297475

However, in another answer which talks about list(), the person said that it's good for small data sets, because everything is loaded into memory.

https://stackoverflow.com/a/13215411/2297475

In my case, I think it's the source of the error. It's too much data to be loaded into memory. What other method can I use?

395

asked Jul 25 '14 19:07

blue_chip

2 Answers

I've modified my code to the following:

cursor = tweets.find(fields=['id'])
tweet_fields = ['id']
result = DataFrame(list(cursor), columns = tweet_fields)

By adding the fields parameter in the find() function I restricted the output. Which means that I'm not loading every field but only the selected fields into the DataFrame. Everything works fine now.

answered Sep 23 '22 20:09

blue_chip

The from_records classmethod is probably the best way to do it:

from pandas import pd
import pymongo

client = pymongo.MongoClient()
data = db.mydb.mycollection.find() # or db.mydb.mycollection.aggregate(pipeline)

df = pd.DataFrame.from_records(data)

answered Sep 25 '22 20:09

Edgar Ramírez Mondragón

Related questions
                            
                                Can I optionally include one element in a list without an else statement in python?
                            
                                Xpath to select only direct siblings with matching attributes
                            
                                How do you determine sqlalchemy driver from the session?
                            
                                Python Beautiful Soup parse a table with a specific ID
                            
                                Is a variable the name, the value, or the memory location?
                            
                                Loading Base64 String into Python Image Library
                            
                                rpy2 import is not working
                            
                                xhtml2pdf ImportError - Django
                            
                                How to code a function that accepts float, list or numpy.array?
                            
                                Using Python to open a shell environment, run a command and exit environment
                            
                                Catch matplotlib warning
                            
                                Python - type(name,bases,dict)
                            
                                pyplot: loglog() with base e
                            
                                Python OSX $ which Python gives /Library/Frameworks/Python.framework/Versions/2.7/bin/python
                            
                                pandas- adding a series to a dataframe causes NaN values to appear
                            
                                Generate multiple independent random streams in python
                            
                                Downloading a LOT of files using python
                            
                                Python Start HTTP Server In Code (Create .py To Start HTTP Server)
                            
                                Private helper functions in Python classes
                            
                                Is there a way to gray out (disable) a tkinter Frame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

A better way to load MongoDB data to a DataFrame using Pandas and PyMongo?

Tags:

python

pandas

pymongo

blue_chip

People also ask

2 Answers

blue_chip

Edgar Ramírez Mondragón

Recent Activity

Donate For Us