Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling a huge MYSQL Table

Hope you are all doing great. We have a huge mysql table called 'posts'. It has about 70,000 records and has gone up to about 10GB is size.

My boss says that something has to be done to make it easy for us to handle this huge table because what if that table gets corrupted then it would take us a lot of time to recover the table. Also at times its slow.

What the are possible solutions so that handling this table becomes easier for as in all aspects.

The structure of the table is as follows:

CREATE TABLE IF NOT EXISTS `posts` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `thread_id` int(11) unsigned NOT NULL,
  `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `first_post` mediumtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `publish` tinyint(1) NOT NULL,
  `deleted` tinyint(1) NOT NULL,
  `movedToWordPress` tinyint(1) NOT NULL,
  `image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `video_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `video_image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `section_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `urlToPost` varchar(280) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  `views` int(11) DEFAULT NULL,
  `forum_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `subject` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `visited` int(11) DEFAULT '0',
  `replicated` tinyint(4) DEFAULT '0',
  `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `urlToPost` (`urlToPost`,`forum_name`),
  KEY `thread_id` (`thread_id`),
  KEY `publish` (`publish`),
  KEY `createdOn` (`createdOn`),
  KEY `movedToWordPress` (`movedToWordPress`),
  KEY `deleted` (`deleted`),
  KEY `forum_name` (`forum_name`),
  KEY `subject` (`subject`),
  FULLTEXT KEY `first_post` (`first_post`,`thread_title`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=78773 ;

Thanking You.

UPDATED

Note: although I am great-full for the replies but almost all answers have been about optimizing the current database and not about how to generally handle large tables. Although I can optimize the database based on the replies I got, it really does not answer the question about handling huge databases. Right now I am talking about 70,000 records but during the next few months if not weeks we are going to grow a magnitude. Each record can be about 300kb in size.

like image 588
Imran Omar Bukhsh Avatar asked Jun 22 '11 10:06

Imran Omar Bukhsh


2 Answers

My answer's also an addition to two previous comments.

You've indexed half of your table. But if you take a look at some indexes (publish, deleted, movedToWordPress) you'll notice they are 1 or 0, so their selectivity is low (number of rows divided by number of distinct values of that column). Those indexes are a waste of space.

Some things also make no sense. tinyint(4) - that doesn't actually make it a 4 digit integer. Number there is display length. tinyint is 1 byte, so it's got 256 possible values. I'm assuming something went wrong there.

Also, 10 gigs in size for just 75k records? How did you measure the size? Also, what's the hardware you got?

Edit in regards to your updated question:

There are many ways to scale databases. I'll link one SO question/answer so you can get the idea what you can do: here it is. The other thing you might do is get better hardware. Usually, the reason why databases are slow when they increase in size is the HDD subsystem and available memory left to work with the dataset. The more RAM you have - the faster it all gets.

Another thing you could do is split your table into two in such a way that one table holds the textual data and the other holds the data relevant to what your system requires to perform certain searching or matching (you'd put integer fields there). Using InnoDB, you'd gain huge performance boost if the two tables were connected via some sort of a foreign key pointing to primary key. Since InnoDB is such that primary key lookups are fast - you are opening several new possibilities to what you can do with your dataset. In case your data gets increasingly huge, you can get enough RAM and InnoDB will try to buffer the dataset in RAM. There's an interesting thing called HandlerSocket that does some neat magic with servers that have enough RAM and are using InnoDB.

In the end it really boils down to what you need to do and how you are doing it. Since you didn't mention that, it's hard to give an estimate here of what you should do. My first step to optimizing would definitely be to tweak MySQL instance and to back that big table up.

like image 167
Michael J.V. Avatar answered Sep 29 '22 12:09

Michael J.V.


I guess you have to change some columns.

You can start by reduce your var char variables.

image_src/video_src/video_image_src VARCHAR(500) is a little too much i think. (100 varchars is enough i would say)

thread_title is text but should be a VARCHAR(200?) if you say me same with section_title

Ok here is your problem content longtext

Do you really need longtext here? longtext is up to 4GB of space. I think if you change this column to text it would be a lot smaller

    TINYTEXT    256 bytes    
    TEXT    65,535 bytes    ~64kb
    MEDIUMTEXT   16,777,215 bytes   ~16MB
    LONGTEXT    4,294,967,295 bytes ~4GB

Edit: i see you use a fulltext index. I am quite sure that is saving a lot a lot a lot of data. You should use another mechanism for searching full text.

like image 37
Michael Koper Avatar answered Sep 29 '22 10:09

Michael Koper