Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql 5.5 Table partition user and friends

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower.

I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of partitioning my tables,

My current Db structure is like

CREATE TABLE `user` (
  `id` BIGINT(20) NOT NULL,
  `name` VARCHAR(255) DEFAULT NULL,
  `email` VARCHAR(255) DEFAULT NULL,
  `location_id` bigint(20) default NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `FK3DC99772C476E06B` (`location_id`),
  CONSTRAINT `FK3DC99772C476E06B` FOREIGN KEY (`location_id`) REFERENCES `places` (`id`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8


CREATE TABLE `friends` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `user_id` BIGINT(20) DEFAULT NULL,
  `friend_id` BIGINT(20) DEFAULT NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `unique_friend` (`user_id`,`friend_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8

Now I am testing how to better use partitioning, for user table following I thought will be good based on by usage.

CREATE TABLE `user_partition` (
  `id` BIGINT(20) NOT NULL,
  `name` VARCHAR(255) DEFAULT NULL,
  `email` VARCHAR(255) DEFAULT NULL,
  `location_id` bigint(20) default NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `FK3DC99772C476E06B` (`location_id`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY HASH(id DIV 100000)
PARTITIONS 30;

I created a procedures to load data in two table and check the performance of the two tables

DELIMITER //
CREATE PROCEDURE load_partition_table()
BEGIN
DECLARE v INT DEFAULT 0;
    WHILE v < 1000000
    DO
    INSERT INTO user_partition (id,NAME,email)
    VALUES (v,CONCAT(v,' name'),CONCAT(v,'@yahoo.com')),
    (v+1,CONCAT(v+1,' name'),CONCAT(v+1,'@yahoo.com')),
    (v+2,CONCAT(v+2,' name'),CONCAT(v+2,'@yahoo.com')),
    (v+3,CONCAT(v+3,' name'),CONCAT(v+3,'@yahoo.com')),
    (v+4,CONCAT(v+4,' name'),CONCAT(v+4,'@yahoo.com')),
    (v+5,CONCAT(v+5,' name'),CONCAT(v+5,'@yahoo.com')),
    (v+6,CONCAT(v+6,' name'),CONCAT(v+6,'@yahoo.com')),
    (v+7,CONCAT(v+7,' name'),CONCAT(v+7,'@yahoo.com')),
    (v+8,CONCAT(v+8,' name'),CONCAT(v+8,'@yahoo.com')),
    (v+9,CONCAT(v+9,' name'),CONCAT(v+9,'@yahoo.com'))
    ;
    SET v = v + 10;
    END WHILE;
    END
    //

CREATE PROCEDURE load_table()
BEGIN
DECLARE v INT DEFAULT 0;
    WHILE v < 1000000
    DO
    INSERT INTO user (id,NAME,email)
    VALUES (v,CONCAT(v,' name'),CONCAT(v,'@yahoo.com')),
    (v+1,CONCAT(v+1,' name'),CONCAT(v+1,'@yahoo.com')),
    (v+2,CONCAT(v+2,' name'),CONCAT(v+2,'@yahoo.com')),
    (v+3,CONCAT(v+3,' name'),CONCAT(v+3,'@yahoo.com')),
    (v+4,CONCAT(v+4,' name'),CONCAT(v+4,'@yahoo.com')),
    (v+5,CONCAT(v+5,' name'),CONCAT(v+5,'@yahoo.com')),
    (v+6,CONCAT(v+6,' name'),CONCAT(v+6,'@yahoo.com')),
    (v+7,CONCAT(v+7,' name'),CONCAT(v+7,'@yahoo.com')),
    (v+8,CONCAT(v+8,' name'),CONCAT(v+8,'@yahoo.com')),
    (v+9,CONCAT(v+9,' name'),CONCAT(v+9,'@yahoo.com'))
    ;
    SET v = v + 10;
    END WHILE;
    END
    //

Results were surprizing, insert/select in non partition table giving better results.

mysql> select count(*) from user_partition;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.40 sec)

mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.00 sec)


mysql> call load_table();
Query OK, 10 rows affected (20.31 sec)

mysql> call load_partition_table();
Query OK, 10 rows affected (21.22 sec)

mysql> select * from user where id = 999999;
+--------+-------------+------------------+---------------------+
| id     | name        | email            | updated_time        |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | [email protected] | 2012-11-27 08:06:54 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)

mysql> select * from user_no_part where id = 999999;
+--------+-------------+------------------+---------------------+
| id     | name        | email            | updated_time        |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | [email protected] | 2012-11-27 08:03:14 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)

So two question

1) Whats the best way to partition user table so that inserts and selects also become fast and removing FOREIGN KEY on location_id is correct? I know partition can be good only if we access on the base of partition key, In my case I want to read the table only by id. why inserts are slower in partition table?

2) What the best way to partition friend table as I want to partition friends on the bases of user_id as want to place all user friends in same partition and always access it using a user_id. Should I drop the primary key on friend.id or add the user_id in primary key?

like image 583
maaz Avatar asked Nov 27 '12 15:11

maaz


People also ask

Does MySQL support table partitioning?

MySQL supports several types of partitioning as well as subpartitioning; see Section 22.2, “Partitioning Types”, and Section 22.2. 6, “Subpartitioning”. Section 22.3, “Partition Management”, covers methods of adding, removing, and altering partitions in existing partitioned tables.

How many partitions can a MySQL table have?

Maximum number of partitions. Beginning with MySQL 5.6. 7, this limit is increased to 8192 partitions.

Does table partitioning improve query performance?

So In a way partitioning distribute your table's data across the file system, so when query is run on a table only a fraction of data is processed which result in better performance.


1 Answers

First I would recommend if possible that you upgrade to 5.6.5 or later of Mysql to ensure you are taking advantage of partitioning properly and with best performance. This is not always possible due to GA concerns, but my experience is that there was a difference in performance between 5.5 and 5.6, and 5.6 offers some other types of partitioning.

1) My experience is that inserts and updates ARE faster on partitioned sets as well as selects AS LONG AS YOU ARE INCLUDING THE COLUMN THAT YOU ARE PARTITIONING ON IN THE QUERY. If I ask for a count of all records across all partitions, I see slower responses. That is to be expected because the partitions are functioning LIKE separate tables, so if you have 30 partitions it is like reading 30 tables and not just one.

You must include the value you are partitioning on in the primary key AND it must remain stable during the life of the record.

2) I would include user_id and id in the primary key - assuming that your friends tables user_id and id do not change at all once the record is established (i.e. any change would be a delete/insert). In my case it was "redundant" but more than worth the access. Whether you choose user_id/id or id/user_id depends on your most frequent access.

A final note. I tried to create LOTS of partitions when I first started breaking my data into partitions, and found that just a few seemed to hit the sweet spot - 6-12 partitions seemed to work best for me. YMMV.

like image 137
TJChambers Avatar answered Oct 07 '22 17:10

TJChambers