Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Practice to migrate data from MySQL to BigQuery

I tried several csv-formats (different escape characters, quotes and other settings) to export data from MySQL and to import it into BigQuery, but I was not able to find a solution that works in every case.

Google SQL requires the following Code for importing/exporting from/to MySQL. Although, Cloud SQL is not BigQuery, it is a good starting point:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '' FROM table

At the moment I use the following command to import a compressed csv into BigQuery:

bq --nosync load -F "," --null_marker "NULL" --format=csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

On one hand the bq-command does not allow to set the escape character (" is escaped by another ", which seems to be a well defined CSV-format). On the other hand \" as escape character for MySQL-export would lead to "N as Null-value, which does not work too:

CSV table references column position 34, but line starting at position:0 contains only 34 columns. (error code: invalid)

So my question is: How to write a (table-independent) export command for MySQL in SQL, such that the generated file can be loaded into BigQuery. Which escape character should be used and how to handle/set null values?

like image 515
NaN Avatar asked Jan 20 '17 23:01

NaN


People also ask

Can you use MySQL on BigQuery?

With a ready-to-use Data Integration Platform, Hevo, you can easily move data from MySQL to BigQuery with just 3 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed setup to move data in minutes. Connect and configure your MySQL database.


2 Answers

I've been running with the same problem, here's my solution:

Exporting data from MySQL

First, export the data from MySQL this way:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '' 
FROM table <yourtable>

This is in reality a tsv file (tab separated values), but you can import them as csv thought.

Import into Big Query

This way you should be able to import it into big query with the following parameters:

bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

Notes

  1. If any field in you MySQL database contains a tab character (\t), it will break your columns. To prevent that you can add the SQL function REPLACE(<column>, '\t', ' ') on the columns and it will convert from tabs to spaces.

  2. If you set the table schema in big query's web interface you won't need to specify it every time you load a CSV.

I hope this works for you.

like image 153
blmayer Avatar answered Oct 07 '22 21:10

blmayer


You could try sqldump-to. It reads in any MySQL compatible dump stream and outputs newline delimited JSON for easy import into BigQuery.

The problem with CSV or TSV are escape characters. JSON doesn't really have that problem.

The tool also supports schema export, which will need to be edited afterwards with specific BigQuery data types per column, but it's a useful head start.

For example, use mysqldump to stream into sqldump-to:

mysqldump -u user -psecret dbname | sqldump-to --dir-output ./dbname --schema

You may need to modify the mysqldump command to match your particular MySQL configuration (eg. remote servers etc.)

If you already have a dump file, the tool also supports multiple workers to better utilize your CPU.

Once sqldump-to has created your JSON files, simply use the bq command line tool to load into BigQuery:

bq load --source_format=NEWLINE_DELIMITED_JSON datasetname.tablename tablename.json tablename_schema.json
like image 35
Arjun Mehta Avatar answered Oct 07 '22 22:10

Arjun Mehta