How to query the documents from couchdb and load them into pandas dataframe?

Tags:

I have downloaded Twitter data on local couchdb server. And it was saved as json files.

I use this code to enter the database in python. 1st import libraries

import couchdb
import pandas as pd
from couchdbkit import Server
import json
import cloudant

next connect to server and choose the database I want to enter.

dbname = couchdb.Server('http://localhost:5984')
db = dbname['Test']
server = couchdb.Server('http://localhost:5984')

I could create and delete databases with python however, I don't know how I can put the data from the server to jupyter notebook. I would like to get the text and time with retweets to analyze it. I can only see one JSON file from python.

If possible I would like to add the all JSON data in the db to pandas dataframe in python so I can analyze it in R too.

The question is: How to query the documents and load them into pandas dataframe?

731

asked Oct 29 '17 02:10

Tateishi

1 Answers

All the documents from a CouchDB's database can be pulled from /{db}/_all_docs end-point with include_docs query attribute. The response is a json object where all the docs listed in rows field.

You can either use requests package to work with CouchDB directly and then load the response into pandas with pandas.read_json or use couchdb package that translates json into python objects internally and then load the response directly, i.e. do something like this:

import couchdb
import pandas as pd

couch = couchdb.Server('http://localhost:5984')
db = couch['Test']
rows = db.view('_all_docs', include_docs=True)
data = [row['doc'] for row in rows]
df = pd.DataFrame(data)

Please be aware than reading a complete database into memory could be resource taxing, so you might want to look into skip and limit query parameters of _all_docs end-point to read information in smaller batches.

answered Nov 15 '22 00:11

eiri

Related questions
                            
                                Can't mock 'os.path.join' with pytest-mock
                            
                                pandas aggregate function with multiple output columns
                            
                                Same Tensorflow model giving different results on Android and Python
                            
                                Django Rest Framework HyperLinkedRelatedField: allow id instead of url for POSTS requests
                            
                                Modulate complex signal on all gpio
                            
                                How to fetch GET and POST params inside a AWS Lambda Function
                            
                                Why are my examples and labels in the wrong order?
                            
                                How to determine object orientation in binary image? (Python, OpenCV)
                            
                                Why can't I find an input element with a placeholder in selenium?
                            
                                Python Tkinter: Binding Keypress Event to Active Tab in ttk.Notebook
                            
                                lightweight alternative for pandas
                            
                                Returning an array of structs in Cython
                            
                                Python socketio example to connect to cryptocompare
                            
                                Share memory between C/C++ and Python
                            
                                apply function with constant parameter to pandas dataframe
                            
                                What is the equivalent to iloc for dask dataframe?
                            
                                Custom Sklearn Transformer works alone, Throws Error When Used in Pipeline
                            
                                Airflow kills my tasks after 1 minute
                            
                                Error while using joblib with imported function
                            
                                Business Hours Between Two Dates in Pandas Dataframe (including holidays)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to query the documents from couchdb and load them into pandas dataframe?

Tags:

python

json

csv

nosql

couchdb

Tateishi

People also ask

1 Answers

eiri

Recent Activity

Donate For Us