Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize this mysql join on large table?

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.

I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):

CREATE TABLE `posts` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `feed_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT NULL,
  `identifier` varchar(255) DEFAULT NULL,
  `published` timestamp NULL DEFAULT NULL,
  `content` longtext,
  ...
  ...
  `is_unread` int(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:

newsletters:

CREATE TABLE `newsletters` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT NULL,
  `sent_at` timestamp NULL DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `topic_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

categories:

CREATE TABLE `categories` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `topic_id` int(11) NOT NULL,
  `title` varchar(255) DEFAULT NULL,
  `slug` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

newsletter_post:

CREATE TABLE `newsletter_post` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT NULL,
  `newsletter_id` int(11) NOT NULL,
  `post_id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

category_post:

CREATE TABLE `category_post` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT NULL,
  `category_id` int(11) NOT NULL,
  `post_id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:

SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0

but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.

and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.

I wanna know if there is a better way to do this query or use indexes or something else. Thanks you in advance for your help.

like image 850
Sallar Avatar asked Jan 03 '13 13:01

Sallar


4 Answers

index your posts table on

( feed_id, published )

so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.

like image 184
DRapp Avatar answered Oct 11 '22 16:10

DRapp


For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:

1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:

      CREATE INDEX index1 ON newsletter_post (post_id);

2) USE only important columns for JOINS.

I mean, you have to only use the columns that use in SELECT part of query.

I hope this'd be helpful.

like image 38
Cesar Avatar answered Oct 11 '22 17:10

Cesar


To complete other answers, I suggest to change this types on posts table:

1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.

Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.

For example indexing and querying int(3) column is more faster than int(11).

like image 2
Afshin Mehrabani Avatar answered Oct 11 '22 17:10

Afshin Mehrabani


Please create the following indexes indexes on ::

1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`
like image 1
Sashi Kant Avatar answered Oct 11 '22 15:10

Sashi Kant