Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL configuration for a data science workload?

All of what I find on the web for advice on tuning MySQL for performance deals with production databases that have a high number of connections and many repeated queries. This is not my workload, instead, I'm doing data investigation with MySQL where I am the only user, the data doesn't change very often (bulk imports only), and the number of connections I might have at any given time is < 20. The data I have is largish (several hundred gigs, tables with 50M rows with a bunch of strings in them), but the queries I write are rarely run more than a few times each.

I have the O'Reilly Schwartz et al. book on MySQL and it has been a godsend for understanding how to make some things (like indices) work to my advantage. Yet I feel much less comfortable with the server parameters for this kind of workload, as I can find few examples on the web. Here are the non-stock (MySQL 5.5, Ubuntu) parameters I am running with:

max_heap_table_size=32G
tmp_table_size=32G
join_buffer_size=6G
innodb_buffer_pool_size=10G
innodb_buffer_pool_instances=2
sort_buffer_size=100M

My server is a multi-core (quad, seems wasted on MySQL but sometimes I'll hit up a couple of queries at once) 32GB of RAM machine. Right now it looks like MySQL is limiting itself to 12GB of ram, likely because of the innodb_buffer_pool size. I set tmp_table_size and heap size to be just fantastical because I had been doing some queries where I stored a lot in memory.

Are there any good resources to tune MySQL to this kind of workload? Are there suggestions on what parameters I should set for innodb?

like image 616
Christopher Avatar asked Apr 24 '26 01:04

Christopher


1 Answers

I don't think you have to tune your InnoDB engine performance any more. The real performance gain will be in the way you structure tables, and the queries you write. Be sure that the columns you select on are indexed, sensible primary keys are chosen, etc. Tables with 50M rows shouldn't be a problem as long as you have a good primary key.

If you haven't run into any performance bottlenecks yet, then I think there is no reason to worry.

like image 107
ktm5124 Avatar answered Apr 26 '26 15:04

ktm5124



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!