I am using mysqldump to create DB dumps of the live application to be used by developers.
This data contains customer data. I want to anonymize this data, i.e. remove customer names / credit card data.
An option would be:
But this has to much overhead. A better solution would be, to do the anonymization during dump creation.
I guess I would end up parsing all the mysqlsqldump
output? Are there any smarter solutions?
You can try Myanon: https://myanon.io
Anonymization is done on the fly during dump:
mysqldump | myanon -f db.conf | gzip > anon.sql.gz
Why are you selecting from your tables if you want to randomize the data?
Do a mysqldump
of the tables that are safe to dump (configuration tables, etc) with data, and a mysqldump
of your sensitive tables with structure only.
Then, in your application, you can construct the INSERT statements for the sensitive tables based on your randomly created data.
I had to develop something similar few days ago. I couldn't do INTO OUTFILE
because the db is AWS RDS. I end up with that approach:
Dump data in tabular text form from some table:
mysql -B -e 'SELECT `address`.`id`, "address1" , "address2", "address3", "town", "00000000000" as `contact_number`, "[email protected]" as `email` FROM `address`' some_db > addresses.txt
And then to import it:
mysql --local-infile=1 -e "LOAD DATA LOCAL INFILE 'addresses.txt' INTO TABLE \`address\` FIELDS TERMINATED BY '\t' ENCLOSED BY '\"' IGNORE 1 LINES" some_db
only mysql
command is required to do this.
As the export is pretty quick (couple of seconds for ~30.000 rows), the import process is a bit slower, but still fine. I had to join few tables on the way and there was some foreign keys so it will surely be faster if you don't need that. Also if you disable foreign key checks while importing it will also speed up things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With