Load a huge data from BigQuery to python/pandas/dask

Question

I read other similar threads and searched Google to find a better way but couldn't find any workable solution.

I have a large large table in BigQuery (assume inserting 20 million rows per day). I want to have around 20 million rows of data with around 50 columns in python/pandas/dask to do some analysis. I have tried using bqclient, panda-gbq and bq storage API methods but it takes 30 min to have 5 millions rows in python. Is there any other way to do so? Even any Google service available to do similar job?

khan · Accepted Answer

Instead of querying, you can always export stuff to cloud storage -> download locally -> load into your dask/pandas dataframe:

Export + Download:

bq --location=US extract --destination_format=CSV --print_header=false 'dataset.tablename' gs://mystoragebucket/data-*.csv &&  gsutil -m cp gs://mystoragebucket/data-*.csv /my/local/dir/

Load into Dask:

>>> import dask.dataframe as dd
>>> df = dd.read_csv("/my/local/dir/*.csv")

Hope it helps.

Load a huge data from BigQuery to python/pandas/dask

Tags:

pandas

google-cloud-platform

dask

google-bigquery

bigdata

MT467

1 Answers

khan

Recent Activity

Donate For Us

Load a huge data from BigQuery to python/pandas/dask

Tags:

pandas

google-cloud-platform

dask

google-bigquery

bigdata

MT467

1 Answers

khan

Related questions

Recent Activity

Donate For Us