I have a very big table, which has almost 300 million records. Since the select query is too slow for me, I want to split it into about 800 small tables.
The dataset looks like this:
XXXXXX column2 column3 column4 ...
XXXXXX column2 column3 column4 ...
XXXXXX column2 column3 column4 ...
YYYYYY column2 column3 column4 ...
YYYYYY column2 column3 column4 ...
I want to split the table based on the value of first column(e.g. record with XXXXXX
splits into table XXXXXX
), what's the quickest way to make it ?
Note: I have already added 10 partitions for it, but it doesn't speed it up very well.
Partitioning works as a performance strategy under two circumstances:
The primary querie(s) that of that table end up doing table or index scans, and are on a system with adequate resources and appropriate configuration to do a high level of parallelism. So if all of the partitions are on the same physical drive, that doesn't buy you much, you're as I/O bound as you were in the first place. But if you're on a 16-core system, with each partition on a physically distinct disk? Partitioning may produce startling improvements in system performance.
The partitioning rule uses an index that is often used in the most prevalent queries against that table. If you're going for performance by that route, you should partition on an indexed value that is often used to filter or constrain the result set. The most frequent candidate is transaction date, since reporting is often by a calendar date range. The query optimizer can then use the partitioning rule to constrict action to a single (smaller) partition, or to run two or more partition scans in parallel (subject to the same strictures mentioned above).
I'm presuming that the primary reason to want to split up this table is for performance. But 800 partitions? If performance improvement is what you're after, that may be the wrong approach. Enterprise databases keep as much top-level table indexes in cache memory for good performance. In a five-level b-tree, for a moderately used table, it's quite possible that the top three levels are always kept in cache, after their first access (this is a likely configuration for a 300M row table with an integer primary key). By splitting your table into 800 pieces, that means there will be 800 data structures to try to keep cached (in addition to table data itself). Chances are, if your access is more-or-less evenly distributed by the primary key, that searching on one partition will end up pushing other partitions out of cache, to the ultimate detriment of overall performance.
Nevertheless, if you're determined to do this, the easiest way to partition a table into N pieces is to partition it by the MODULUS of number of partitions you want against the primary key (primary_key % 800
, in your case). Newer version s of MySQL also have hash partition support, making partitioning into arbitrary numbers of sets fairly strightforward:
PARTITION BY HASH(some_column_value) PARTITIONS number_of_partitions
If you want to put your data into 800 actual tables, instead, you'll have to do so editor magic, or use a scripting language, and do it in SQL:
CREATE TABLE table1 LIKE MasterTable
CREATE TABLE table2 LIKE MasterTable
CREATE TABLE table3 LIKE MasterTable
..
INSERT INTO table1 SELECT * FROM MasterTable WHERE id MOD 800 = 0
INSERT INTO table2 SELECT * FROM MasterTable WHERE id MOD 800 = 1
INSERT INTO table3 SELECT * FROM MasterTable WHERE id MOD 800 = 2
You could do this in a loop in your favorite programming language using dynamic SQL: that would probably be the easiest to render.
Thank you for each one who inspired me for this stored procedure ! I'm happy to share it with you :
DELIMITER $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `split_tablebyrowscnt` (IN `tableName` VARCHAR(40), IN `step` INT) BEGIN
SET @table := tableName;
SET @liminf := 0;
SET @limsup := step;
SET @sql_2 = CONCAT('SELECT COUNT(*) INTO @rwcnt FROM ',@table,';');
PREPARE stmt from @sql_2;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
WHILE @liminf<@rwcnt DO
SET @sql_1 = CONCAT('SELECT CAST(',@limsup,' as char(10)) INTO @limsup_str;');
PREPARE stmt from @sql_1;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET @sql_loop =CONCAT('CREATE TABLE ',@table,'_',@limsup_str,' SELECT * FROM(SELECT @rownum:=@rownum+1 rownum,d.* FROM (',@table,' d, (SELECT @rownum:=0) r))t
WHERE ( rownum >?) AND (rownum <= ?);');
PREPARE stmt from @sql_loop;
EXECUTE stmt USING @liminf,@limsup;
DEALLOCATE PREPARE stmt;
SET @sql_drop = CONCAT('ALTER TABLE ',@table,'_',@limsup_str,' DROP COLUMN rownum;');
PREPARE stmt from @sql_drop;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET @liminf = @liminf + step;
SET @limsup = @limsup + step;
END WHILE ;
END$$
DELIMITER ;
To execute the procedure : CALL split_tablebyrowscnt('myTable',100)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With