Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert large amount of data to BigQuery via bigquery-python library

I have large csv files and excel files where I read them and create the needed create table script dynamically depending on the fields and types it has. Then insert the data to the created table.

I have read this and understood that I should send them with jobs.insert() instead of tabledata.insertAll() for large amount of data.

This is how I call it (Works for smaller files not large ones).

result  = client.push_rows(datasetname,table_name,insertObject) # insertObject is a list of dictionaries

When I use library's push_rows it gives this error in windows.

[Errno 10054] An existing connection was forcibly closed by the remote host

and this in ubuntu.

[Errno 32] Broken pipe

So when I went through BigQuery-Python code it uses table_data.insertAll().

How can I do this with this library? I know we can upload through Google storage but I need direct upload method with this.

like image 294
Marlon Abeykoon Avatar asked Aug 16 '16 09:08

Marlon Abeykoon


People also ask

How do I import multiple GCS files into BigQuery?

If using cloud storage is an option, you can put them all in a common prefix in a bucket and use a wildcard e.g. gs://my_bucket/some/path/files* to specify a single load job with multiple inputs quickly.


1 Answers

When handling large files don't use streaming, but batch load: Streaming will easily handle up to 100,000 rows per second. That's pretty good for streaming, but not for loading large files.

The sample code linked is doing the right thing (batch instead of streaming), so what we see is a different problem: This sample code is trying to load all this data straight into BigQuery, but the uploading through POST part fails. gsutil has a more robust uploading algorithm than just a plain POST.

Solution: Instead of loading big chunks of data through POST, stage them in Google Cloud Storage first, then tell BigQuery to read files from GCS.

See also BigQuery script failing for large file

like image 82
Felipe Hoffa Avatar answered Sep 30 '22 14:09

Felipe Hoffa