Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Export GCP Cloud SQL PostgreSQL to GCS in Parquet Format

I have data in GCP Cloud SQL PostgreSQL, I want to export this data into GCS in Parquet format, I see that it's not possible directly, only I can do in SQL and CSV format, anyways for exporting the data in Parquet format.

like image 892
MadRed Avatar asked May 22 '26 01:05

MadRed


1 Answers

I propose you to achieve this through BigQuery

  1. Connect BigQuery to Cloud SQL postgres instance with Federated queries feature
  2. Create a new table in BigQuery with the Cloud SQL postgres data
CREATE TABLE my_dataset.temp_table AS
SELECT * FROM EXTERNAL_QUERY("project_id.region.connection_name", "SELECT * FROM .....;");
  1. Use the BigQuery export table feature with parquet option (which is in preview stage)
bq --location=region extract \
--destination_format parquet \
project_id:my_dataset.temp_table \
gs://bucket/filename.parquet
  1. Delete the temporary table in BigQuery

This way guaranty you only one file (or a small number of sharded file.

However you can speed up the process, but this solution can generate an higher number of file

  1. Connect BigQuery to Cloud SQL postgres instance with Federated table feature
  2. Export the data to GCS in only one query
EXPORT DATA OPTIONS(
  uri='gs://bucket/filename.parquet*',
  format='PARQUET') AS
SELECT * FROM EXTERNAL_QUERY("project_id.region.connection_name", "SELECT * FROM .....;");

https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#export_data_statement

like image 80
guillaume blaquiere Avatar answered May 27 '26 11:05

guillaume blaquiere