I'm trying to create 300M files from a java program, I switched from the old file API to the new java 7 nio package, but the new package is going even slower than the old one.
I see less CPU utilization than I did when I was using the old file API, but I'm running this simple code and I'm getting 0.5Mbytes/sec file transfer rates and the writes from java are reading off one disk and writing to another (the write is the only process accessing the disk).
Files.write(FileSystems.getDefault().getPath(filePath), fiveToTenKBytes, StandardOpenOption.CREATE);
Is there any hope of getting a reasonable throughput here?
Update:
I'm unpacking 300 million 5-10k byte image files from large files. I have 3 disks, 1 local and 2 SAN attached (all have a typical throughput rate of ~20MB/sec on large files).
I've also tried this code which improved speed to barely less than 2MB/sec throughput (9ish days to unpack these files).
ByteBuffer byteBuffer = ByteBuffer.wrap(imageBinary, 0, (BytesWritable)value).getLength());
FileOutputStream fos = new FileOutputStream( imageFile );
fos.getChannel().write(byteBuffer);
fos.close();
I read from the local disk and write to the SAN attached disk. I'm reading from a Hadoop SequenceFile format, hadoop is typically able to read these files at 20MB/sec using basically the same code.
The only thing that appears out of place, other than the uber slowness, is that I see more read IO than write IO by about 2:1, though the sequence file is gziped (images get virtually a 1:1 ratio though), so the compressed file should be approx. 1:1 with the output.
2nd UPDATE
Looking at iostat
I see some odd numbers, we're looking at xvdf here, I have one java process reading from xvdb
and writing to xvdf
and no ohter processes active on xvdf
iostat -d 30
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvdap1 1.37 5.60 4.13 168 124
xvdb 14.80 620.00 0.00 18600 0
xvdap3 0.00 0.00 0.00 0 0
xvdf 668.50 2638.40 282.27 79152 8468
xvdg 1052.70 3751.87 2315.47 112556 69464
The reads on xvdf
are 10x the writes, that's unbelievable.
fstab
/dev/xvdf /mnt/ebs1 auto defaults,noatime,nodiratime 0 0
/dev/xvdg /mnt/ebs2 auto defaults,noatime,nodiratime 0 0
If I understood your code correctly, you're splitting/writing the 300M files in small chunks ("fiveToTenKBytes
").
Consider to use a Stream approach.
If you're writing to a disk, consider to wrap the OutputStream with a BufferedOutputStream.
E.g. something like:
try (BufferedOutputStream bos = new BufferedOutputStream(Files.newOutputStream(Paths.getPath(filePathString), StandardOpenOption.CREATE))){
...
}
I think your slowness is coming from creating new files, not actual transfer. I believe that creating a file is a synchronous operation in Linux: the system call will not return until the file has been created and the directory updated. This suggests a couple of things you can do:
byte[]
, then create a Runnable
that writes the output file from this array. Use a threadpool with lots of threads -- maybe 100 or more -- because they'll be spending most of their time waiting for the creat
to complete. Set the capacity of this pool's inbound queue based on the amount of memory you have: if your files are 10k in size, then a queue capacity of 1,000 seems reasonable (there's no good reason to allow the reader to get too far ahead of the writers, so you could even go with a capacity of twice the number of threads).BufferedInputStream
s and BufferedOutputStreams
. Your problem here is syscalls, not memory speed (the NIO classes are designed to prevent copies between heap and off-heap memory).I'm going to assume that you already know not to attempt to store all the files into a single directory. Or even store more than a few hundred files in one directory.
And as another alternative, have you considered S3 for storage? I'm guessing that its bucket keys are far more efficient than actual directories, and there is a filesystem that lets you access buckets as if they were files (haven't tried it myself).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With