Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scan a mysql table, why fetching result sets row-by-row is much faster than retrieve them in memory completely?

We have an innodb table with 12,000,000+ records.

I use two ways to SELECT * from this table using JDBC.

Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

This way lets the driver to stream result sets row-by-row , and it takes 7s to finish scanning.

Statement stmt =conn.createStatement();

ResultSets are completely retrieved and stored in memory.And this way takes 21s !

Just feel confused, why fetching the result set row by row is faster than retrieve the result set completed into client memory? The way of row by row should not take more time on networking transferring?

like image 997
IvyTang Avatar asked Nov 12 '22 17:11

IvyTang


1 Answers

Just to expand on my comment on the OP

This is most likely a memory issue - reading 12m results into memory could cause paging unless the client has a lot of RAM. As soon as you start thrashing the disk, performance will drop considerably. It's worth noting that if you do start increasing RAM, the JVM has some quirks in how it addresses >32G (it switches to 64-bit pointers) which means that as you transition past 32G you actually lose available memory and may have other issues depending on how your code is written.

To put things into perspective, we're using elasticsearch at the moment to index ~60 million documents. Admittedly, the memory usage will be more involved as it's handling indices, caches, etc... but we wouldn't consider giving it less than 16G of RAM to get performant responses. I've met people using >100G per shard for really big record sets.

like image 142
Basic Avatar answered Jan 04 '23 03:01

Basic