Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently process 300+ Files concurrently in scala

Tags:

file-io

scala

I'm going to work on comparing around 300 binary files using Scala, bytes-by-bytes, 4MB each. However, judging from what I've already done, processing 15 files at the same time using java.BufferedInputStream tooks me around 90 sec on my machine so I don't think my solution would scale well in terms of large number of files.

Ideas and suggestions are highly appreciated.

EDIT: The actual task is not just comparing the difference but to processing those files in the same sequence order. Let's say I have to look at byte ith in every file at the same time, and moving on to (ith + 1).

like image 733
Ekkmanz Avatar asked Dec 30 '22 11:12

Ekkmanz


1 Answers

Did you notice your hard drive slowly evaporating as you read the files? Reading that many files in parallel is not something mechanical hard drives are designed to do at full-speed.

If the files will always be this small (4MB is plenty small enough), I would read the entire first file into memory, and then compare each file with it in series.

I can't comment on solid-state drives, as I have no first-hand experience with their performance.

like image 190
zildjohn01 Avatar answered Mar 02 '23 18:03

zildjohn01