Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

populate a MySQL with a large series of rows quickly

Tags:

mysql

I have a MySQL Table that looks like this:

MySQL Table: status

The SQL to create the structure is:

CREATE TABLE `status` (
`id` INT(11) NOT NULL,
`responseCode` INT(3) NOT NULL DEFAULT '503',
`lastUpdate` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

It stores a unique id, responseCode, and lastUpdate. The responseCode is an HTTP Request response code: 404, 500, 503, 200, etc.

I have a URL to correspond to each id for which I make an HTTP request and record in this table the time I made the request and the response received.

The script makes this query against the status table:

SELECT id FROM status WHERE lastUpdate < 'XXXX' OR 
(responseCode != 200 AND responseCode != 404) 
ORDER BY id DESC LIMIT 100

Where XXXX would be a date where I decide that anything older than that date needs to be refreshed regardless of the response code. Further, I want to reattempt the HTTP request if I didn't get a 200 or 404 regardless of the last lastUpdate date. I LIMIT to 100 because I only run 100 at a time, and then I have it sleep for a while and do another 100 later, and so on.

Anyways, all that's fine, but what I want to do is populate the table ahead of time with say a series like this:

(1, 503, NOW()), (2, 503, NOW()), (3, 503, NOW()) ... (100000, 503, NOW())

Notice, only the ID is incrementing, but it may not necessarily start at 1 for my needs. I want the table pre-populated like this, because then the query above can keep grabbing id's for ones we need to reattempt, and I'd like to not have to ever insert anything more into the status table as the id's are finite and will not change (but there are many of them).

I tried using JAVA, (though PHP, C#, or whatever else is the same concept and doesn't matter to me which language I use here):

PreparedStatement st = conn.prepareStatement("INSERT INTO status VALUES (?,default,default)");

for( int i = 1; i <= 100000; i++ ) {
    st.setInt(1,i);
    st.addBatch();
}

System.out.println( "Running batch..." );
st.executeBatch();
System.out.println( "Batch done!" );

This starts the inserts, but the issue is that it takes an extraordinary amount of time to populate the table (I don't have an exact time, but it was running for hours). So, my question boils down to: is there an easy and efficient way to populate a MySQL table with a mass amount of rows like this?

like image 941
user17753 Avatar asked Oct 08 '12 19:10

user17753


2 Answers

Generally speaking, you can use any one or more of the following:

  • Start a transaction, do inserts, commit
  • Pack multiple values into a single insert into query
  • Drop any constraints before doing insert and reinstate constraints after the mass insert (except possibly primary key, not very sure about it though)
  • Use insert into ... select if suitable

The first (using transactions) is most likely to help, but i'm not sure if it works on myisam tables, with innodb it does a very good job - I only use those when I'm forced to use mysql, I prefer postgresql.

In your specific case, inserting 100000 rows of data, you could do the following:

INSERT INTO status(id, responseCode, lastUpdate) SELECT @row := @row + 1 as row, 503, NOW() FROM 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t2, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t3, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t4, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t5, 
(SELECT @row:=0) t6;

Tested this on my machine, got:

Query OK, 100000 rows affected (0.70 sec)
Records: 100000  Duplicates: 0  Warnings: 0

I'm pretty sure you can't get much faster than that for 100000 rows.

like image 52
xception Avatar answered Sep 24 '22 18:09

xception


How about setting AUTO_INCREMENT on the primary key.

Then inserting the first hundred (or thousand) rows whichever way you like (your example or the example DocJones gave you).

Then using

INSERT INTO table SELECT NULL, '503', NOW() FROM table;

...repeatedly a few time. This should make the table double in size every time.

The NULL in the first slot of the SELECT ensures the AUTO_INCREMENT kicks in and increments id.

If you wanna grow the table even faser you can do

INSERT INTO table SELECT NULL, '503', NOW() FROM table AS t1 CROSS JOIN table t2;

...repeatedly a few times which would make the table increase in size with powers of two of the previous size + previous size (100^2+100).

This also allows you to customize the values inserted for example if you want to create "random" responseCodes you can use something like CONCAT(ROUND(1+RAND()*4), '0', ROUND(RAND()*5)) which will give you response codes ranging from 100 to 505.

like image 41
Mihai Stancu Avatar answered Sep 22 '22 18:09

Mihai Stancu