I'm trying to migrate csv files from Google Cloud Storage (GCS), which have been exported from BigQuery, to a PostgreSQL Google cloud sql instance using a python script.
I was hoping to use the Google API but found this in the documentation:
Importing CSV data using the Cloud SQL Admin API is not supported for PostgreSQL instances.
As an alternative I could use psycopg2
library and stream the rows of the csv file into the SQL instance. I can do this three ways
My concerns are these csv files could contain millions of rows and running this process for any of the three options mentioned above seems like a bad idea to me.
What alternatives do I have? Essentially I have some raw data in BigQuery on which we do some preprocessing before exporting to GCS in preparation for importing to the PostgreSQL instance. I need to export this preprocessed data from BigQuery to the PostgreSQL instance.
This is not a duplicate of this question as I'm preferably looking for the solution which exports data from BigQuery to the PostgreSQL instance wether it be via GCS or direct.
PgAdmin is a Graphical User Interface (GUI) that allows businesses to import data into PostgreSQL databases. With this service, you can convert CSV files into acceptable PostgreSQL database formats, and import the CSV into your PostgreSQL format.
In the Google Cloud console, go to the Cloud SQL Instances page. To open the Overview page of an instance, click the instance name. Click Import. In the Choose the file you'd like to import data from section, enter the path to the bucket and CSV file to use for the import.
You can do the import process with Cloud Dataflow as suggested by @GrahamPolley. It's true that this solution involves some extra work (getting familiar with Dataflow, setting everything up, etc). Even with the extra work, this would be the preferred solution for your situation. However, other solutions are available and I'll explain one of them below.
To set up a migration process with Dataflow, this tutorial about exporting BigQuery to Google Datastore is a good example
Alternative solution to Cloud Dataflow
Cloud SQL for PostgreSQL doesn't support importing from a .CSV
but it does support .SQL
files.
The file type for the specified uri.
SQL: The file contains SQL statements.
CSV: The file contains CSV data. Importing CSV data using the Cloud SQL Admin API is not supported for PostgreSQL instances.
A direct solution would be to convert the .CSV
filest to .SQL
with some tool (Google doens't provide one that I know of, but there are many online) and then import to the PostgreSQL.
If you want to implement this solution in a more "programatic" way, I would suggest to use Cloud Functions, here is an example of how I would try to do it:
.CSV
. If it is, use a csv-to-sql API (example of API here) to convert the file to .SQL
Before you begin, you should make sure:
The database and table you are importing into must already exist on your Cloud SQL instance.
CSV file format requirements CSV files must have one line for each row of data and have comma-separated fields.
Then, you can import data to a Cloud SQL instance using a CSV file present in a GCS bucket following the next steps [GCLOUD]
gcloud sql instances describe [INSTANCE_NAME]
Copy the serviceAccountEmailAddress field.
Add the service account to the bucket ACL as a writer:
gsutil acl ch -u [SERVICE_ACCOUNT_ADDRESS]:W gs://[BUCKET_NAME]
gsutil acl ch -u [SERVICE_ACCOUNT_ADDRESS]:R gs://[BUCKET_NAME]/[IMPORT_FILE_NAME]
gcloud sql import csv [INSTANCE_NAME] gs://[BUCKET_NAME]/[FILE_NAME] \
--database=[DATABASE_NAME] --table=[TABLE_NAME]
gsutil acl ch -d [SERVICE_ACCOUNT_ADDRESS] gs://[BUCKET_NAME]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With