Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it worth to parallelize queries with jdbc and mysql?

Tags:

java

mysql

jdbc

One jdbc "select" statement takes 5 secs to complete. So doing 5 statements takes 25 secs.

Now I try to do the job in parallel. The db is mysql with innodb. I start 5 threads and give each thread its own db connection. But it still takes 25 secs for all to complete?

Note I provide java with enough heap and have 8 cores but only one hd (maybe having only one hd is the bottleneck here?)

Is this the expected behavour with mysql out of the box? here is example code:

public void doWork(int n) {
        try (Connection conn = pool.getConnection();
             PreparedStatement stmt = conn.prepareStatement("select id from big_table where id between "+(n * 1000000)" and " +(n * 1000000 +1000000));
        ) { 
            try (ResultSet rs = stmt.executeQuery();) {
                while (rs.next()) {
                    Long itemId = rs.getLong("id");
                }
            }
        }
}

public void doWorkBatch() {
    for(int i=1;i<5;i++)
        doWork(i);
}

public void doWorkParrallel() {
    for(int i=1;i<5;i++)
        new Thread(()->doWork(i)).start();
    System.console().readLine();
}

(I don't recall where but I read that a standard mysql installation can easily handle 1000 connections in parallel)

like image 324
jack Avatar asked Oct 15 '25 03:10

jack


2 Answers

Looking at your problem definitely multi-threading will improve your performance because even i once converted an 4-5 hours batch job into a 7-10 minute job by doing exactly the same what you're thinking but you need to know the following things before hand while designing :-

1) You need to think about inter-task dependencies i.e. tasks getting executed on different threads.

2) Using connection pool is a good sign since Creating Database connections are slow process in Java and takes long time.

3) Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.

4) Cut tasks into several work units where each unit does one job.

5) Particularly for your case, i.e. using mysql. Which database engine you use would also affect the performance as the InnoDB engine uses row-level locking. This way, it will handle much higher traffic. The (usual) alternative, however, (MyISAM) does not support row-level locking, it uses table locking. i'm talking about the case What if another thread comes in and wants to update the same row before the first thread commits.

6) To improve performance of Java database application is running queries with setAutoCommit(false). By default new JDBC connection has there auto commit mode ON, which means every individual SQL Statement will be executed in its own transaction. while without auto commit you can group SQL statement into logical transaction, which can either be committed or rolled back by calling commit() or rollback().

You can also checkout springbatch which is designed for batch processing.

Hope this helps.

like image 181
codechefvaibhavkashyap Avatar answered Oct 17 '25 16:10

codechefvaibhavkashyap


It depends where the bottleneck in your system is... If your queries spend a few seconds each establishing the connection to the database, and only a fraction of that actually running the query, you'd see a nice improvement. However if the time is spent in mysql, running the actual query, you wouldn't see as much of a difference.

The first thing I'd do, rather than trying concurrent execution is to optimize the query, maybe add indices to your tables, and so forth.

like image 35
Thomas Avatar answered Oct 17 '25 15:10

Thomas