I was wondering if Google BigQuery currently supports Parquet file format or if there are plans to support it?
I know that it currently supports CSV and JSON formats.
At this time BigQuery does not support Parquet file format.
BigQuery supports UTF-8 encoding for both nested or repeated and flat data. BigQuery supports ISO-8859-1 encoding for flat data only for CSV files.
Parquet is optimized to work with complex data in bulk and features different ways for efficient data compression and encoding types. This approach is best especially for those queries that need to read certain columns from a large table. Parquet can only read the needed columns therefore greatly minimizing the IO.
BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.
** As of 1st March 2018, Support for loading Parquet 1.0 files is available.
In the BigQuery CLI, there is --source_format PARQUET
option which is described in output of bq --help
.
I never got to use it, because when I was experimenting with this feature, it was still invite-only, and I did not request the invite.
My usecase was that the Parquet file is half the size of the Avro file. I wanted to try something new and upload data efficiently (in this order).
% bq load --source_format PARQUET test.test3 data.avro.parquet schema.json
Upload complete.
Waiting on bqjob_r5b8a2b16d964eef7_0000015b0690a06a_1 ... (0s) Current
status: DONE
[...]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With