Best Practice to migrate data from MySQL to BigQuery

Tags:

I tried several csv-formats (different escape characters, quotes and other settings) to export data from MySQL and to import it into BigQuery, but I was not able to find a solution that works in every case.

Google SQL requires the following Code for importing/exporting from/to MySQL. Although, Cloud SQL is not BigQuery, it is a good starting point:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '' FROM table

At the moment I use the following command to import a compressed csv into BigQuery:

bq --nosync load -F "," --null_marker "NULL" --format=csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

On one hand the bq-command does not allow to set the escape character (" is escaped by another ", which seems to be a well defined CSV-format). On the other hand \" as escape character for MySQL-export would lead to "N as Null-value, which does not work too:

CSV table references column position 34, but line starting at position:0 contains only 34 columns. (error code: invalid)

So my question is: How to write a (table-independent) export command for MySQL in SQL, such that the generated file can be loaded into BigQuery. Which escape character should be used and how to handle/set null values?

515

asked Jan 20 '17 23:01

NaN

2 Answers

I've been running with the same problem, here's my solution:

Exporting data from MySQL

First, export the data from MySQL this way:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '' 
FROM table <yourtable>

This is in reality a tsv file (tab separated values), but you can import them as csv thought.

Import into Big Query

This way you should be able to import it into big query with the following parameters:

bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

Notes

If any field in you MySQL database contains a tab character (\t), it will break your columns. To prevent that you can add the SQL function REPLACE(<column>, '\t', ' ') on the columns and it will convert from tabs to spaces.
If you set the table schema in big query's web interface you won't need to specify it every time you load a CSV.

I hope this works for you.

153

answered Oct 07 '22 21:10

blmayer

You could try sqldump-to. It reads in any MySQL compatible dump stream and outputs newline delimited JSON for easy import into BigQuery.

The problem with CSV or TSV are escape characters. JSON doesn't really have that problem.

The tool also supports schema export, which will need to be edited afterwards with specific BigQuery data types per column, but it's a useful head start.

For example, use mysqldump to stream into sqldump-to:

mysqldump -u user -psecret dbname | sqldump-to --dir-output ./dbname --schema

You may need to modify the mysqldump command to match your particular MySQL configuration (eg. remote servers etc.)

If you already have a dump file, the tool also supports multiple workers to better utilize your CPU.

Once sqldump-to has created your JSON files, simply use the bq command line tool to load into BigQuery:

bq load --source_format=NEWLINE_DELIMITED_JSON datasetname.tablename tablename.json tablename_schema.json

answered Oct 07 '22 22:10

Arjun Mehta

Related questions
                            
                                Mysql Trigger with IF THEN
                            
                                mysql match against multiple words
                            
                                insert password into database in md5 format? [duplicate]
                            
                                Using multiple where clauses with laravel query builder
                            
                                Linking django and mysql containers using docker-compose
                            
                                Are prepared statements cached server-side across multiple page loads with PHP?
                            
                                get data from mysql database to use in javascript
                            
                                mySQL - update multiple columns with a select returning multiple rows
                            
                                MySQL query performance dilemma: enum vs tables
                            
                                How to migrate SQL Server database to MySQL? [closed]
                            
                                PHP: Remote MySQL connections very slow
                            
                                MySQL GROUP by Regex?
                            
                                MySQL trigger 'update on column' syntax
                            
                                MySQL - Are "NOT NULL" constraints needed for primary keys?
                            
                                Sphinx vs. MySql - Search through list of friends (efficiency/speed)
                            
                                Mysql UNION and GROUP BY
                            
                                Should I commit after a single select
                            
                                How should I establish and manage database connections in a multi-module Python app?
                            
                                Doctrine query builder ~ datetime
                            
                                PyMySQL returning old/snapshot values/not rerunning query?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best Practice to migrate data from MySQL to BigQuery

Tags:

mysql

csv

google-bigquery

NaN

People also ask