How can i load my csv from google dataLab to a pandas data frame?

Tags:

google-cloud-datalab

Here is what i tried: (ipython notebook, with python2.7)

import gcp
import gcp.storage as storage
import gcp.bigquery as bq
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

sample_bucket_name = gcp.Context.default().project_id + '-datalab'
sample_bucket_path = 'gs://' + sample_bucket_name 
sample_bucket_object = sample_bucket_path + '/myFile.csv'
sample_bucket = storage.Bucket(sample_bucket_name)
df = bq.Query(sample_bucket_object).to_dataframe()

Which fails.
would you have any leads what i am doing wrong ?

698

asked Jun 23 '16 11:06

2 Answers

Based on the datalab source code bq.Query() is primarily used to execute BigQuery SQL queries. In in terms of reading a file from Google Cloud Storage (GCS), one potential solution is to use the datalab %gcs line magic function to read the csv from GCS into a local variable. Once you have the data in a variable, you can then use the pd.read_csv() function to convert the csv formatted data into a pandas DataFrame. The following should work:

import pandas as pd
from StringIO import StringIO

# Read csv file from GCS into a variable
%gcs read --object gs://cloud-datalab-samples/cars.csv --variable cars

# Store in a pandas dataframe
df = pd.read_csv(StringIO(cars))

There is also a related stackoverflow question at the following link: Reading in a file with Google datalab

answered Jan 01 '23 22:01

Anthonios Partheniou

In addition to @Flair's comments about %gcs, I got the following to work for the Python 3 kernel:

    import pandas as pd
    from io import BytesIO

    %gcs read --object "gs://[BUCKET ID]/[FILE].csv" --variable csv_as_bytes

    df = pd.read_csv(BytesIO(csv_as_bytes))
    df.head()

answered Jan 01 '23 23:01

Tony

Related questions
                            
                                How do I quickly get data out of a Google Cloud Datalab notebook?
                            
                                How to pull and push notebooks to github from google cloud datalab?
                            
                                How do we connect Azure Machine Learning Studio to Google BigQuery?
                            
                                Adding python libraries to google datalab environment
                            
                                How to pull notebooks from github to google cloud datalab?
                            
                                How do I share my notebooks in DataLab?
                            
                                Accessing Big Query from Cloud DataLab using Pandas
                            
                                How to correctly stop Google Cloud Datalab
                            
                                How do I configure Google Cloud Datalab to use GPUs for TensorFlow?
                            
                                "debconf: delaying package configuration, since apt-utils is not installed" :> google notebooks bash error
                            
                                Using persistent disks with google Datalab
                            
                                Deploying a custom build of Datalab to Google Cloud platform
                            
                                What are the credentials used by Datalab for accessing data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With