Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
There are 20+ tasks defined in that application, each one may depends on other ones or not. The application execute tasks one by one, the whole application runs in a single thread.
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.
Multithreading is a Java feature that allows concurrent execution of two or more parts of a program for maximum utilization of CPU. Each part of such program is called a thread. So, threads are light-weight processes within a process. Threads can be created by using two mechanisms : Extending the Thread class.
1) It doesn't block the user because threads are independent and you can perform multiple operations at the same time. 2) You can perform many operations together, so it saves time. 3) Threads are independent, so it doesn't affect other threads if an exception occurs in a single thread.
JDBC allows you to share a single Connection among multiple threads.
Multi threading may be of help, if the lines are uncorrelated, you may start off two processes one reading even lines, another uneven lines, and get your db connection from a connection pool (dbcp) and analyze performance. But first I would investigate whether jdbc is the best approach normally databases have optimized solutions for imports like this. These solutions may also temporarily switch of constraint checking of your table, and turn that back on later, which is also great for performance. As always depending on your requirements.
Also you may want to checkout springbatch which is designed for batch processing.
Multi-threading will improve your performance but there are a couple of things you need to know:
commit
once in a while to avoid accumulating huge rollback/undo tables.To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
What you should do:
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
Using different worker threads gives you lots of advantages:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With