Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql: 7 billions records in a table

I need to save around 7.8 billion records in a Mysql table. The Table is both read and write intensive. I have to maintain at least 0.02 billion records per hour of insertion rate. While search on the table should not take more than 10 seconds. We have a UI from where users can search based on different colums property.

Mostly searched query can be like:

  1. select * from mytable where prop1='sip:+100008521149' and prop2='asdsa' order by event_timestamp desc limit 10;

  2. select * from mytable where prop1='sip:+100008521149' order by event_timestamp desc limit 10;

  3. select * from mytable where prop2='asdsa' order by event_timestamp desc limit 10;

Currently there are 2 indexes on table:

1- idx_1(prop1,event_timestamp)
2- idx_2(prop2,event_timestamp)

InnoDB settings are as follows:

    innodb_buffer_pool_size = 70G
    innodb_log_file_size = 4G
    innodb_io_capacity=2000
    innodb_io_capacity_max=6000
    innodb_lru_scan_depth=2000
    innodb_flush_log_at_trx_commit=2
    innodb_log_buffer_size=16M
    innodb_thread_concurrency = 0
    innodb_read_io_threads = 64
    innodb_write_io_threads = 64
    innodb_autoinc_lock_mode = 2
    bulk_insert_buffer_size=33554432
    query_cache_type=1
    query_cache_size=64M
    innodb_flush_neighbors=0
    expire_logs_days=10
    max-connections=500
    long_query_time = 5
    read_buffer_size=16M
    sort_buffer_size=16M
    read_rnd_buffer_size=16M
    innodb_doublewrite = 0
    innodb_flush_method=O_DIRECT

Machine's RAM size is 99 GB.

Once started system was fast but performance reduces a lot when record reached to 0.22 billion. Though we are using LOAD INFILE,insertion rate was very slow.Search was pretty fast while searching on indexed params. Looks like buffer pool is not enough.

I have few question:

  1. Is this possible to support this kind of data with this config.

  2. What should be the ideal and practical buffer pool size for 7 billion records.

  3. DATA+INDEX size is coming close to 150 GB with only 0.22 billion records. Looks like I need to have TB's of ram.
  4. We are thinking of Master/Slave config to make congif for read and write dominant on respective servers.
  5. Any other better way to design this solution?
  6. Increasing more indexes makes UI search better but increasing single an index is reducing insertion speed many folds.

UPADTE:1

Q- The table is much bigger than RAM, correct? The buffer_pool cannot be made big enough -- it must be smaller than ram, or else performance suffers.

A- RAM size is 100 GB, Buffer pool is 70 G. Yes the data size is too big than RAM.

Q- Please provide SHOW CREATE TABLE; there are several issues I need to investigate. (datatypes, field sizes, etc)

A- All the fields are string type. we have used varchar(127) for all. PK is autogenerated id bigint(20).

Q- How many records in the LOAD DATA INFILE? Do you LOAD directly into the table? How often is the LOAD?

A- 100000 records per file. Multiple threads are loading data from CSV file to DB. In the initial migration, we have to load it continuously till 0.65 billion records. After that frequency will reduce to around per 15 minutes.

Q- Master+Slave: Keep in mind that all writes are performed on the Slave, too. If you have lots of reads, then more than one Slave would spread the reads around, thereby getting some scaling.

A- We are testing currently with MASTER/SLAVE approach.

We made MASTER with MYISAM and no indexes. MASTER will be used for inserts. SLAVE having INNODB and with 2 indexes. The search will be performed on it. Both are different machines and not sharing RAM or CPU. The application is on the third machine.

Q- Do you have spinning drives? Or SSDs? A- How to check it?

Q- Your rows seem to be quite large. Are there TEXTs or BLOBs? If so, SELECT * may be a serious performance burden.

A- yes rows have 50 columns but data is in around 15-20 columns. We can't reduce the size of datatypes as all fields can hold any amount of alphanumeric data. All are TEXTS no BLOBS.

like image 913
Mudit bhaintwal Avatar asked Jul 07 '16 05:07

Mudit bhaintwal


1 Answers

Turn off the query cache: It must purge all entries in the QC each time an INSERT occurs -- that's 5555 times a second!

query_cache_type = 0
query_cache_size = 0

The first query needs INDEX(prop1, prop2, event_timestamp). (The prop1 and prop2 can be swapped.)

With that added index, each of the three queries will touch no more than 10 rows in the index, and do no more than 10 random(?) fetches into the data. At worst, that is only about 11 disk hits. And @Bernd's 'lazy eval' will not make it any better.

The table is much bigger than RAM, correct? The buffer_pool cannot be made big enough -- it must be smaller than ram, or else performance suffers.

Please provide SHOW CREATE TABLE; there are several issues I need to investigate. (datatypes, field sizes, etc)

How many records in the LOAD DATA INFILE? Do you LOAD directly into the table? How often is the LOAD?

Master+Slave: Keep in mind that all writes are performed on the Slave, too. If you have lots of reads, then more than one Slave would spread the reads around, thereby getting some scaling.

Do you have spinning drives? Or SSDs?

Your rows seem to be quite large. Are there TEXTs or BLOBs? If so, SELECT * may be a serious performance burden.

like image 118
Rick James Avatar answered Sep 16 '22 22:09

Rick James