Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java FileOutputStream consecutive close takes a long time

Tags:

java

java-io

I'm facing a little weird situation.

I'm copying from FileInputStream to FileOutputStream a file that is sized around 500MB. It goes pretty well (takes around 500ms). When I close this FileOutputStream the FIRST time, it takes about 1ms.

But here comes the catch, when I run this again, every consecutive close takes around 1500-2000ms! The duration is dropped back to 1ms when I delete this file.

Is there some essential java.io knowledge I'm missing?

It seems to be related to OS. I'm running on ArchLinux (the same code run on Windows 7 have all the times under 20ms). Note that it doesn't matter if it runs in OpenJDK or Oracle's JDK. Hard drive is a solid state drive with ext4 file-system.

Here is my testing code:

public void copyMultipleTimes() throws IOException {
    copy();
    copy();
    copy();
    new File("/home/d1x/temp/500mb.out").delete();
    copy();
    copy();
    // Runtime.getRuntime().exec("sync") => same results
    // Thread.sleep(30000) => same results
    // combination of sync & sleep => same results
    copy();
}

private void copy() throws IOException {
    FileInputStream fis = new FileInputStream("/home/d1x/temp/500mb.in");
    FileOutputStream fos = new FileOutputStream("/home/d1x/temp/500mb.out");
    IOUtils.copy(fis, fos); // copyLarge => same results
    // copying takes always the same amount of time, only close "enlarges"

    fis.close(); // input stream close this is always fast
    // fos.flush(); // has no effect 
    // fos.getFD().sync(); // Solves the problem but takes ~2.5s

    long start = System.currentTimeMillis();
    fos.close();
    System.out.println("OutputStream close took " + (System.currentTimeMillis() - start) + "ms");
}

The output is then:

OutputStream close took 0ms
OutputStream close took 1951ms
OutputStream close took 1934ms
OutputStream close took 1ms
OutputStream close took 1592ms
OutputStream close took 1727ms
like image 922
zdenda.online Avatar asked Aug 14 '14 12:08

zdenda.online


People also ask

How do I Close a file output stream in Java?

Java FileOutputStream close () Method The close () method of FileOutputStream class is used to close the file output stream and releases all system resources associated with this stream.

What are the methods of fileoutputstream in Java?

Methods of FileOutputStream 1 write () Method 2 Example: FileOutputStream to write data to a File. In the above example, we have created a file output stream named output. ... 3 flush () Method. To clear the output stream, we can use the flush () method. ... 4 close () Method. To close the file output stream, we can use the close () method. ...

How to close the file output stream in Python?

When we run the program, the file flush.txt is filled with the text represented by the string data. To close the file output stream, we can use the close () method. Once the method is called, we cannot use the methods of FileOutputStream. returns the object of FileChannel associated with the output stream

Do I need to close my fileoutputstream after finalize?

Yes, you do. While the garbage collector does close your FileOutputStream (by calling finalize ), it is not a good idea to rely on it because it runs unpredictably.


2 Answers

@Duncan proposed the following explanation:

The first call to close() returns quickly, yet the OS is still flushing data to disk. The subsequent calls to close() can't complete until the previous flushing is complete.

I think this is close to the mark, but not exactly correct.

I think that what is actually going on here is that the first copy is filling up the operating system's file buffer cache with large numbers of dirty pages. The internal daemon that flushes the dirty pages to discs may start working on them, but it is still going when you start the second copy.

When you do the second copy, the OS tries to acquire buffer cache pages for reading and writing. But since the buffer cache is full of dirty pages the read and write calls are repeatedly blocked, waiting for free pages to become available. But before a dirty page can be recycled, the data in the page needs to be written to disc. The net result is that the copy slows down to the effective data write rate.


A 30 second pause may not be sufficient to complete flushing the dirty pages to disc.

One thing you could try is to do an fsync(fd) or fdatasync(fd) between the copies. In Java, the way to do that is to call FileDescriptor.sync().

Now, I can't say if this is going to improve total copy throughput, but I'd expect a sync operation to be better at writing out (just) one file than relying on the page eviction algorithm to do it.

like image 153
Stephen C Avatar answered Nov 02 '22 23:11

Stephen C


You seem on to something interesting. Under Linux someone is allowed to be holding a file handle to the original file, when you open it, actually deleting the directory entry and starting afresh. This does not bother the original file (handle). On closing than, maybe some disk directory work happens.

Test it with IOUtils.copyLarge and Files.copy:

Path target = Paths.get("/home/d1x/temp/500mb.out");
Files.copy(fis, target, StandardCopyOption.REPLACE_EXISTING);

(I once saw a IOUtils.copy that just called copyLarge, but Files.copy should act nice.)

like image 35
Joop Eggen Avatar answered Nov 03 '22 00:11

Joop Eggen