Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i implement multithreading in java to process 2 million text files?

I have to process around 2 million text files and generate there triples.

Suppose I have a txt file xyz.txt(one of the files of 2 million input) , it is processed as below:

start(xyz.txt)---->module1(xyz.tpd)------>module2(xyz.adv)-------->module3(xyz.tpl)

suggest me a logic or concept so that i can process faster and in an optimized way on x64 4GB windows systems.

module1(working): it parses the txt file using a .bat file in which parser is invoked, it is a separate system thread and after 15 seconds it again starts parsing another txt file, and so on....

module2(working): it accepts .tpd file as input and generates .adv file. module3(working): it accepts .adv file as input and generates .tpl(triples).

should i start threads from txt files or at some other point..? i am afraid that if i the CPU get stuck in context switching.

can anyone have a better logic, so that i can try it..!?

like image 958
Roshan Avatar asked Jan 13 '23 05:01

Roshan


2 Answers

Use a ThreadPoolExecutor .Tune it's parameters like number of active threads and others to suit your environment and system.

like image 79
Sumit Desai Avatar answered Jan 15 '23 19:01

Sumit Desai


Most importantly, you have to write the program, profile it, and see where the bottleneck is. It is more than probable that the disk I/O operations will be the bottleneck and no amount of multithreading will solve your problems.

In that case using two(three? four?) separate hard drives may yield more speed gain than the best multithreaded solution.

Furthermore, the general rule is that you should optimize your application only when you have working code and you really know what to optimize. Profile, profile, profile.

Taking the future multithreaded optimizations into account when writing is OK; the architecture should be flexible enough to allow for future optimizations.

like image 38
Dariusz Avatar answered Jan 15 '23 18:01

Dariusz