I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?
So far i am using this:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery
%load_ext google.cloud.bigquery
import google.datalab.bigquery as bq
from google.cloud.bigquery import Client
client = bigquery.Client()
Here is my SQL command:
sql = """
SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
You can get BigQuery data into a dataframe magnitudes faster by changing the method.
Check how these options are reflected in the chart:
to_dataframe() - Uses BigQuery tabledata.list API.to_dataframe(bqstorage_client=bqstorage_client), package version 1.16.0 - Uses BigQuery Storage API with Avro data format.to_dataframe(bqstorage_client=bqstorage_client), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.to_arrow(bqstorage_client=bqstorage_client).to_pandas(), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.
Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas().
See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With