I have to dump 6 million files which contain around 100-200 characters, and it's painfully slow. The actual slow part is the file writing, if I comment that part out (the call to the WriteSoveraFile method) the whole thing runs in 5-10 minutes. As it is, I ran it overnight (16 hours) and got done with 2 million records.
is there any faster method?
Would I be better off creating an array of arrays and then dumping it all at once? (my system only has 4 GB, wouldn't it die from the 6 GB of data consumed by this?)
Here is the procedure:
public static void WriteSoveraFile(String fileName, String path, String contents) throws IOException {
BufferedWriter bw = null;
try {
String outputFolderPath = cloGetAsFile( GenCCD.o_OutER7Folder ).getAbsolutePath() ;
File folder = new File( String.format("%1$s/Sovera/%2$s/", outputFolderPath, path) );
if (! folder.exists()) {
folder.mkdirs();
/* if (this.rcmdWriter != null)
this.rcmdWriter.close();
*/
}
File file = new File( String.format("%1$s/%2$s", folder.getAbsolutePath(),fileName) );
// if file doesnt exists, then create it
if (!file.exists()) {
file.createNewFile();
FileWriter fw = new FileWriter(file.getAbsoluteFile());
bw = new BufferedWriter(fw);
bw.write(contents);
bw.close();
}
/* else {
file.delete(); // want to delete the file?? or just overwrite it??
file.createNewFile();*/
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (bw != null) bw.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
This is almost certainly an OS filesystem issue; writing lots of files simply is slow. I recommend writing a comparison test in shell and in C to get an idea of how much the OS is contributing. Additionally, I would suggest two major tweaks:
FileWriter
may block on the close()
operation.(I was going to suggest looking into NIO, but the APIs don't seem to offer much benefit for your situation, since setting up an mmapped buffer would probably introduce more overhead than it would save for this size.)
As has been mentioned, your limiting factor is storage access not your code or the JVM. There are a few things in your code that code be improved, but the changes would go unnoticed since the underlying bottleneck is the file IO.
There are some possible ways to speed up the process:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With