Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access Google BigQuery Data from local Jupyter Notebooks

I have gotten a few Notebooks up and going on DataLab. I'd like, for a variety of reasons to access the same data from a local Jupyter notebook on my machine.

This question suggested a few approaches which so far I can't get working.

Specifically The Gcloud library:

from gcloud import bigquery
client = bigquery.Client()

Give me a stack trace the last line of which:

ContextualVersionConflict: (protobuf 2.6.1 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('protobuf!=3.0.0.b2.post1,>=3.0.0b2'), set(['gcloud']))

The Pandas library seems promising:

df=pd.io.gbq.read_gbq('SELECT CCS_Category_ICD9, Gender, Admit_Month FROM [xxxxxxxx-xxxxx:xxxx_100MB_newform.xxxxxx_100MB_newform]ORDER by CCS_Category_ICD9',
                 project_id='xxxxxxxx-xxxxx')

Also gives me a stack trace:

IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/httplib2-0.9.1.dist-info/METADATA'

Perhaps I have an auth issue on the Pandas approach, although my browser is currently Auth'd to the project? or am I missing a dependency?

Any suggestions or guidance appreciated..

What is the best way to access A BigQuery Datasource from within a local Jupyter notebook?

like image 578
dartdog Avatar asked May 17 '16 19:05

dartdog


People also ask

How do I access Jupyter notebook locally?

How to open Jupyter Notebook. To launch a Jupyter notebook, open your terminal and navigate to the directory where you would like to save your notebook. Then type the command jupyter notebook and the program will instantiate a local server at localhost:8888 (or another specified port).


1 Answers

Based on the error from gbq.read() , it appears that httplib2 may not be correctly installed. On the pandas installation page, there are a few optional dependencies which are required for Google BigQuery support (httplib2 is one of them). To re-install/repair the installation try:

pip install httplib2 --ignore-installed

Once the optional dependencies for Google BigQuery support are installed, the following code should work:

from pandas.io import gbq
df = gbq.read_gbq('SELECT * FROM MyDataset.MyTable', project_id='my-project-id')
like image 58
Anthonios Partheniou Avatar answered Sep 24 '22 13:09

Anthonios Partheniou