Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating data over to BigQuery from Redshift

I want to migrate 1 TB of data spread across ~100 tables from Redshift to BigQuery.

Are there any tools for this data migrations? If something like 'pgloader' (used to migrate from MySQL to PostgreSQL) is available it will save a lot of time?

like image 734
vaichidrewar Avatar asked Sep 23 '16 08:09

vaichidrewar


1 Answers

2019 update - now officially supported:

  • https://cloud.google.com/bigquery/docs/redshift-migration

There are 2 open sources alternatives that I know of on GitHub:

  • https://github.com/iconara/bigshift
  • https://github.com/uswitch/bqshift

Both seem pretty well maintained, while bigshift has been around for a longer time and shows a more complete documentation (for now).

Quoting bigshift docs on why the tool is pretty useful when doing this migration:

The CSV produced by Redshift's UNLOAD can't be loaded into BigQuery no matter what options you specify on either end. Redshift can quote all fields or none, but BigQuery doesn't allow non-string fields to be quoted. The format of booleans and timestamps are not compatible, and they expect quotes in quoted fields to be escaped differently, to name a few things.

This means that a lot of what BigShift does is make sure that the data that is dumped from Redshift is compatible with BigQuery. To do this it reads the table schema and translates the different datatypes while the data is dumped. Quotes are escaped, timestamps formatted, and so on.

like image 184
Felipe Hoffa Avatar answered Oct 15 '22 21:10

Felipe Hoffa