Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating index takes too long time

Tags:

mysql

innodb

About 2 months ago, I imported EnWikipedia data(http://dumps.wikimedia.org/enwiki/20120211/) into mysql.

After finished importing EnWikipedia data, I have been creating index in the tables of the EnWikipedia database in mysql for about 2 month.

Now, I have reached the point of creating index in "pagelinks".

However, it seems to take an infinite time to pass that point.

Therefore, I checked the time remaining to pass to ensure that my intuition was correct or not.

As a result, the expected time remaining was 60 days(assuming that I create index in "pagelinks" again from the beginning.)

My EnWikipedia database has 7 tables: 
"categorylinks"(records: 60 mil, size: 23.5 GiB), 
"langlinks"(records: 15 mil, size: 1.5 GiB), 
"page"(records: 26 mil, size 4.9 GiB), 
"pagelinks"(records: 630 mil, size: 56.4 GiB), 
"redirect"(records: 6 mil, size: 327.8 MiB), 
"revision"(records: 26 mil, size: 4.6 GiB) and "text"(records: 26 mil, size: 60.8 GiB).

My server is... Linux version 2.6.32-5-amd64 (Debian 2.6.32-39),Memory 16GB, 2.39Ghz Intel 4 core

Is that common phenomenon for creating index to take so long days ? Does anyone have a good solution to create index more quickly ?

Thanks in advance !


P.S: I made following operations for checking the time remaining.

References(Sorry,following page is written in Japanese): http://d.hatena.ne.jp/sh2/20110615

1st. I got records in "pagelink".

 mysql> select count(*) from pagelinks;
 +-----------+ 
 | count(*)  |
 +-----------+ 
 | 632047759 |
 +-----------+ 
 1 row in set (1 hour 25 min 26.18 sec)

2nd. I got the amount of records increased per minute.

getHandler_write.sh

 #!/bin/bash

 while true
 do
   cat <<_EOF_
 SHOW GLOBAL STATUS LIKE 'Handler_write';
 _EOF_
   sleep 60
 done | mysql -u root -p -N

command

$ sh getHandler_write.sh
 Enter password:
 Handler_write   1289808074
 Handler_write   1289814597
 Handler_write   1289822748
 Handler_write   1289829789
 Handler_write   1289836322
 Handler_write   1289844916
 Handler_write   1289852226

3rd. I computed the speed of recording.

According to the result of 2. ,the speed of recording is

 7233 records/minutes

4th. Then the time remaining is

 (632047759/7233)/60/24 = 60 days
like image 932
sensing_bot Avatar asked Apr 12 '12 10:04

sensing_bot


People also ask

Does creating index take time?

I have just created 1 index in a large database ( 172 million rows ) in the testing server ... It took around 8 hours to complete which is very long time ...

How can I make indexes faster?

make a structurally duplicate table of the table you wish to index. Add the index to the new empty table. copy the data from the old table to the new table in chunks. drop the old table.

How can I speed up index creation in SQL Server?

You can speed things up by disabling the non-clusterd indexes, then manually building them after the clustered index has been rebuilt.

How much time does it take to create index in Mongodb?

The background process took a total of 153 minutes and the foreground took 29 minutes. The background process took more than 5 times more.


1 Answers

Those are pretty big tables, so I'd expect the indexing to be pretty slow. 630 million records is a LOT of data to index. One thing to look at is partitioning, with data sets that large, without correctly partitioned tables, performance will be sloooow. Here's some useful links: using partioning on slow indexes you could also try looking at the buffer size settings for building the indexes (the default is 8MB, do for your large table that's going to slow you down a fair bit. buffer size documentation

like image 151
Tim Avatar answered Sep 20 '22 05:09

Tim