Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql workbench, replace column from file

I have a 1,000,000 row .csv file that I uploaded into a table using mySQL Workbench, but I forgot to makes the dates YYYY-MM-DD before I started, so they all uploaded as 0000-00-00.

It took almost 8 hours to upload the million records, so I'd REALLY like to not have to do it all over again, but I can't figure out if there's a way for me to replace JUST that one column of data from the same file I originally uploaded, now that I've changed the dates to be in the correct format.

Does anyone know if this is possible?

Edit

It's WAY too long to post everything, but: here's the show create table with some of the meat taken out:

CREATE TABLE myTable
(   lineID int(11) NOT NULL AUTO_INCREMENT,
    1 varchar(50) DEFAULT NULL,
    2 varchar(1) DEFAULT NULL,
    3 int(4) DEFAULT NULL,
    4 varchar(20) DEFAULT NULL,
    DATE date DEFAULT NULL,
    PRIMARY KEY (lineID)
) ENGINE=InnoDB AUTO_INCREMENT=634205 DEFAULT CHARSET=utf8

Version is 5.6.20

Screenshot: enter image description here

like image 716
Brian Powell Avatar asked Sep 21 '15 18:09

Brian Powell


1 Answers

Ok. I would recommend using LOAD DATA INFILE explicitly. For those that have not used it, consider it just as a select statement for now til you see it.

Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:

The fastest way you can import a table into MySQL without using raw files is the LOAD DATA syntax. Use parallelization for InnoDB for better results, and remember to tune basic parameters like your transaction log size and buffer pool. Careful programming and importing can make a >2-hour problem became a 2-minute process. You can disable temporarily some security features for extra performance

There are also fine points in there, mainly in peer comments back and forth about secondary indexes (which you do not have). The important point for others is to add them after the fact.

I hope these links are useful. And your data comes in ... in 10 minutes (in another test table with LOAD DATA INFILE).

General Comments

About the slowest way to do it is in a programming language via a while loop, row by row. Getting faster is certainly batch, where one insert statement passes along, say, 200 to 1k rows at a time. Up substantially in performance is LOAD DATA INFILE. Fastest is raw files (what I do, but beyond the scope of talking here).

like image 66
Drew Avatar answered Oct 19 '22 09:10

Drew