Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetching data from BigQuery taking very long [duplicate]

I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?

So far i am using this:

import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery

%load_ext google.cloud.bigquery

import google.datalab.bigquery as bq
from google.cloud.bigquery import Client

client = bigquery.Client()

Here is my SQL command:

sql = """
   SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
like image 557
s_khan92 Avatar asked Mar 14 '26 05:03

s_khan92


1 Answers

You can get BigQuery data into a dataframe magnitudes faster by changing the method.

Check how these options are reflected in the chart:

  • A: to_dataframe() - Uses BigQuery tabledata.list API.
  • B: to_dataframe(bqstorage_client=bqstorage_client), package version 1.16.0 - Uses BigQuery Storage API with Avro data format.
  • C: to_dataframe(bqstorage_client=bqstorage_client), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.
  • D: to_arrow(bqstorage_client=bqstorage_client).to_pandas(), package version 1.17.0 - Uses BigQuery Storage API with Arrow data format.

enter image description here

Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas().

See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171

like image 128
Felipe Hoffa Avatar answered Mar 15 '26 18:03

Felipe Hoffa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!