Here is what i tried: (ipython notebook, with python2.7)
import gcp
import gcp.storage as storage
import gcp.bigquery as bq
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
sample_bucket_name = gcp.Context.default().project_id + '-datalab'
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/myFile.csv'
sample_bucket = storage.Bucket(sample_bucket_name)
df = bq.Query(sample_bucket_object).to_dataframe()
Which fails.
would you have any leads what i am doing wrong ?
CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
Use Cloud Datalab to easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.
Based on the datalab source code bq.Query()
is primarily used to execute BigQuery SQL queries. In in terms of reading a file from Google Cloud Storage (GCS), one potential solution is to use the datalab %gcs
line magic function to read the csv from GCS into a local variable. Once you have the data in a variable, you can then use the pd.read_csv()
function to convert the csv formatted data into a pandas DataFrame. The following should work:
import pandas as pd
from StringIO import StringIO
# Read csv file from GCS into a variable
%gcs read --object gs://cloud-datalab-samples/cars.csv --variable cars
# Store in a pandas dataframe
df = pd.read_csv(StringIO(cars))
There is also a related stackoverflow question at the following link: Reading in a file with Google datalab
In addition to @Flair's comments about %gcs, I got the following to work for the Python 3 kernel:
import pandas as pd
from io import BytesIO
%gcs read --object "gs://[BUCKET ID]/[FILE].csv" --variable csv_as_bytes
df = pd.read_csv(BytesIO(csv_as_bytes))
df.head()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With