Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL High Write Latency

I'm developing a social-like application which is currently deployed using AWS services. In particular, the DB runs on RDS using MYSQL. So far, we're testing the app using a limited number of users (mostly friends) resulting in an average of 15 Write IOPS/sec.

The real problem is related to the very high writing latency of the db, which is always above 100ms. The RDS instance is a db.m3.xlarge which is much more than what we need.

I tried to perform a load test in a separate instance (identical configuration of DB and EC2) but i've not been able to reproduce such a high latency, even if I was sending a much higher number of requests. So I thought it may be due to table fragmentation, but i've not yet run a table optimisation, because the db wouldn't be accessible during this procedure.

Do you have any experience with this problem?

MORE INFO

  • We're using mysql version 5.6.21 with INNODB as storage engine.
  • The whole DB is about 100MB in size
  • The biggest table (called Message) has about 790k rows. Concerning this table, the following query

    insert into Message (user_id, creationDate, talk_id, text, id) 
    values (2015, '2015-02-01 16:40:06.737', 18312, 'Some text ', 904870)
    

    took 11s to be executed.

  • Even worse, the query

    insert into Comment (anonymous, user_id, creationDate, deleted, post_id, text, id) 
    values (1, 107347, '2015-02-01 16:40:01.849', 0, 124888, 'Comment text', 265742)
    

    took 14s, but the table Comment has about 160k.

Those two tables are generated by:

CREATE TABLE `comment` (
    `id` bigint(20) NOT NULL,
    `anonymous` bit(1) NOT NULL,
    `creationDate` datetime NOT NULL,
    `deleted` bit(1) NOT NULL,
    `text` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
    `user_id` bigint(20) NOT NULL,
    `post_id` bigint(20) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `FK_jhvt6d9ap8gxv67ftrmshdfhj` (`user_id`),
    KEY `FK_apirq8ka64iidc18f3k6x5tc5` (`post_id`),
    CONSTRAINT `FK_apirq8ka64iidc18f3k6x5tc5` FOREIGN KEY (`post_id`) REFERENCES `post` (`id`),
    CONSTRAINT `FK_jhvt6d9ap8gxv67ftrmshdfhj` FOREIGN KEY (`user_id`) REFERENCES `kuser` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

and

CREATE TABLE `message` (
    `id` bigint(20) NOT NULL,
    `creationDate` datetime NOT NULL,
    `text` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
    `user_id` bigint(20) NOT NULL,
    `talk_id` bigint(20) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `FK_d0j091jvk2y4mmfbadnqlohtf` (`user_id`),
    KEY `FK_64tr15t6wu5y9u143gxt6o3g2` (`thread_id `),
    CONSTRAINT `FK_64tr15t6wu5y9u143gxt6o3g2` FOREIGN KEY (`thread_id`) REFERENCES `thread` (`id`),
    CONSTRAINT `FK_d0j091jvk2y4mmfbadnqlohtf` FOREIGN KEY (`user_id`) REFERENCES `kuser` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

SOME PLOTS

Using AppDynamics I've been able to extract the following plots:

  • Wait States: Isn't the query end time too big?

    SQL Wait states

  • Page Buffer:

    Page Buffer

  • Write Latency and Queue:

    RDS Stats

Query Cache

+------------------------------+-----------+
| Variable_name                | Value     |
+------------------------------+-----------+
| query_cache_limit            | 1048576   |
| query_cache_min_res_unit     | 4096      |
| query_cache_size             | 1048576   |
| query_cache_type             | OFF       |
| query_cache_wlock_invalidate | OFF       |
+------------------------------+-----------+

Thank for your help!

Andrea

like image 385
a.periz Avatar asked Feb 10 '15 16:02

a.periz


1 Answers

I got in touch with RDS engineers from amazon and they gave me the solution. Such a high latency was due to a very low performing storage type. Indeed, I was using the default 5GB SSD (which they call GP2) which gives 3 IOPS per GB of storage, resulting in 15 IOPS when my application required about 50 IOPS or even more.

Therefore, they suggested me to change the storage type to Magnetic which provides 100 IOPS as baseline. Moreover, I've also been able to decrease the instance type because the bottleneck was only the disk.

The migration took about 3h due to the very low performance of the source disk (GP2).

Hope it may help someone out there!

like image 129
a.periz Avatar answered Nov 09 '22 00:11

a.periz