I tried several csv-formats (different escape characters, quotes and other settings) to export data from MySQL and to import it into BigQuery, but I was not able to find a solution that works in every case.
Google SQL requires the following Code for importing/exporting from/to MySQL. Although, Cloud SQL is not BigQuery, it is a good starting point:
SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '' FROM table
At the moment I use the following command to import a compressed csv into BigQuery:
bq --nosync load -F "," --null_marker "NULL" --format=csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json
On one hand the bq-command does not allow to set the escape character ("
is escaped by another "
, which seems to be a well defined CSV-format). On the other hand \"
as escape character for MySQL-export would lead to "N
as Null-value, which does not work too:
CSV table references column position 34, but line starting at position:0 contains only 34 columns. (error code: invalid)
So my question is: How to write a (table-independent) export command for MySQL in SQL, such that the generated file can be loaded into BigQuery. Which escape character should be used and how to handle/set null values?
With a ready-to-use Data Integration Platform, Hevo, you can easily move data from MySQL to BigQuery with just 3 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed setup to move data in minutes. Connect and configure your MySQL database.
I've been running with the same problem, here's my solution:
First, export the data from MySQL this way:
SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY ''
FROM table <yourtable>
This is in reality a tsv file (tab separated values), but you can import them as csv thought.
This way you should be able to import it into big query with the following parameters:
bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json
If any field in you MySQL database contains a tab character (\t
), it will break your columns. To prevent that you can add the SQL function REPLACE(<column>, '\t', ' ')
on the columns and it will convert from tabs to spaces.
If you set the table schema in big query's web interface you won't need to specify it every time you load a CSV.
I hope this works for you.
You could try sqldump-to. It reads in any MySQL compatible dump stream and outputs newline delimited JSON for easy import into BigQuery.
The problem with CSV or TSV are escape characters. JSON doesn't really have that problem.
The tool also supports schema export, which will need to be edited afterwards with specific BigQuery data types per column, but it's a useful head start.
For example, use mysqldump
to stream into sqldump-to
:
mysqldump -u user -psecret dbname | sqldump-to --dir-output ./dbname --schema
You may need to modify the mysqldump command to match your particular MySQL configuration (eg. remote servers etc.)
If you already have a dump file, the tool also supports multiple workers to better utilize your CPU.
Once sqldump-to
has created your JSON files, simply use the bq
command line tool to load into BigQuery:
bq load --source_format=NEWLINE_DELIMITED_JSON datasetname.tablename tablename.json tablename_schema.json
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With