Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i load my csv from google dataLab to a pandas data frame?

Here is what i tried: (ipython notebook, with python2.7)

import gcp
import gcp.storage as storage
import gcp.bigquery as bq
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

sample_bucket_name = gcp.Context.default().project_id + '-datalab'
sample_bucket_path = 'gs://' + sample_bucket_name 
sample_bucket_object = sample_bucket_path + '/myFile.csv'
sample_bucket = storage.Bucket(sample_bucket_name)
df = bq.Query(sample_bucket_object).to_dataframe()

Which fails.
would you have any leads what i am doing wrong ?

like image 698
Cy Bu Avatar asked Jun 23 '16 11:06

Cy Bu


People also ask

Can pandas read CSV?

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

Which one is the function of cloud datalab?

Use Cloud Datalab to easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.


2 Answers

Based on the datalab source code bq.Query() is primarily used to execute BigQuery SQL queries. In in terms of reading a file from Google Cloud Storage (GCS), one potential solution is to use the datalab %gcs line magic function to read the csv from GCS into a local variable. Once you have the data in a variable, you can then use the pd.read_csv() function to convert the csv formatted data into a pandas DataFrame. The following should work:

import pandas as pd
from StringIO import StringIO

# Read csv file from GCS into a variable
%gcs read --object gs://cloud-datalab-samples/cars.csv --variable cars

# Store in a pandas dataframe
df = pd.read_csv(StringIO(cars))

There is also a related stackoverflow question at the following link: Reading in a file with Google datalab

like image 54
Anthonios Partheniou Avatar answered Jan 01 '23 22:01

Anthonios Partheniou


In addition to @Flair's comments about %gcs, I got the following to work for the Python 3 kernel:

    import pandas as pd
    from io import BytesIO

    %gcs read --object "gs://[BUCKET ID]/[FILE].csv" --variable csv_as_bytes

    df = pd.read_csv(BytesIO(csv_as_bytes))
    df.head()
like image 37
Tony Avatar answered Jan 01 '23 23:01

Tony