Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I load data from mongodb collection into pandas' DataFrame?

I am new to pandas (well, to all things "programming"...), but have been encouraged to give it a try. I have a mongodb database - "test" - with a collection called "tweets". I access the database in ipython:

import sys import pymongo from pymongo import Connection connection = Connection() db = connection.test tweets = db.tweets 

the document structure of documents in tweets is as follows:

entities': {u'hashtags': [],   u'symbols': [],   u'urls': [],   u'user_mentions': []},  u'favorite_count': 0,  u'favorited': False,  u'filter_level': u'medium',  u'geo': {u'coordinates': [placeholder coordinate, -placeholder coordinate], u'type': u'Point'},  u'id': 349223842700472320L,  u'id_str': u'349223842700472320',  u'in_reply_to_screen_name': None,  u'in_reply_to_status_id': None,  u'in_reply_to_status_id_str': None,  u'in_reply_to_user_id': None,  u'in_reply_to_user_id_str': None,  u'lang': u'en',  u'place': {u'attributes': {},   u'bounding_box': {u'coordinates': [[[placeholder coordinate, placeholder coordinate],      [-placeholder coordinate, placeholder coordinate],      [-placeholder coordinate, placeholder coordinate],      [-placeholder coordinate, placeholder coordinate]]],    u'type': u'Polygon'},   u'country': u'placeholder country',   u'country_code': u'example',   u'full_name': u'name, xx',   u'id': u'user id',   u'name': u'name',   u'place_type': u'city',   u'url': u'http://api.twitter.com/1/geo/id/1820d77fb3f65055.json'},  u'retweet_count': 0,  u'retweeted': False,  u'source': u'<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',  u'text': u'example text',  u'truncated': False,  u'user': {u'contributors_enabled': False,   u'created_at': u'Sat Jan 22 13:42:59 +0000 2011',   u'default_profile': False,   u'default_profile_image': False,   u'description': u'example description',   u'favourites_count': 100,   u'follow_request_sent': None,   u'followers_count': 100,   u'following': None,   u'friends_count': 100,   u'geo_enabled': True,   u'id': placeholder_id,   u'id_str': u'placeholder_id',   u'is_translator': False,   u'lang': u'en',   u'listed_count': 0,   u'location': u'example place',   u'name': u'example name',   u'notifications': None,   u'profile_background_color': u'000000',   u'profile_background_image_url': u'http://a0.twimg.com/images/themes/theme19/bg.gif',   u'profile_background_image_url_https': u'https://si0.twimg.com/images/themes/theme19/bg.gif',   u'profile_background_tile': False,   u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/241527685/1363314054',   u'profile_image_url':       u'http://a0.twimg.com/profile_images/378800000038841219/8a71d0776da0c48dcc4ef6fee9f78880_normal.jpeg',   u'profile_image_url_https':     u'https://si0.twimg.com/profile_images/378800000038841219/8a71d0776da0c48dcc4ef6fee9f78880_normal.jpeg',    u'profile_link_color': u'000000',   u'profile_sidebar_border_color': u'FFFFFF',   u'profile_sidebar_fill_color': u'000000',   u'profile_text_color': u'000000',   u'profile_use_background_image': False,   u'protected': False,   u'screen_name': placeholder screen_name',   u'statuses_count': xxxx,   u'time_zone': u'placeholder time_zone',   u'url': None,   u'utc_offset': -21600,   u'verified': False}} 

Now, as far as I understand, pandas' main data structure - a spreadsheet-like table - is called DataFrame. How can I load the data from my "tweets" collection into pandas' DataFrame? And how can I query for a subdocument within the database?

like image 517
user2161725 Avatar asked Jul 23 '13 08:07

user2161725


1 Answers

Comprehend the cursor you got from the MongoDB before passing it to DataFrame

import pandas as pd df = pd.DataFrame(list(tweets.find())) 
like image 114
waitingkuo Avatar answered Sep 21 '22 16:09

waitingkuo