Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Did I find a bug in java.io.PipedInputStream?

Tags:

I'm not sure, but I'm pretty certain that I've found a bug (or a non-documented feature) in the Oracle Java implementation (1.7.0_67 and 1.8.0_31 I could verify as affected).

The Symptom

When the pipe is full, a write to the pipe might wait up to one second longer than needed for the pipe to become free again. A minimal example of the problem is as follows (I've pushed the sample shown here to a repository on GitHub):

private static void threadA() throws IOException, InterruptedException {
  logA("Filling pipe...");
  pos.write(new byte[5]);
  logA("Pipe full. Writing one more byte...");
  pos.write(0);
  logA("Done.");
}

private static void threadB() throws IOException, InterruptedException {
  logB("Sleeping a bit...");
  Thread.sleep(100);
  logB("Making space in pipe...");
  pis.read();
  logB("Done.");
}

pis and pos are connected PipedInputStream and PipedOutputStream instances, respectively. logA and logB are helper functions which output the thread name (A or B), a timestamp in milliseconds and the message. The output is as follows:

     0 A: Filling pipe...
     6 B: Sleeping a bit...
     7 A: Pipe full. Writing one more byte...
   108 B: Making space in pipe...
   109 B: Done.
  1009 A: Done.

As one can see, there is one second (1000 ms) between B: Done and A: Done. This is caused by the implementation of PipedInputStream in the Oracle Java 1.7.0_67, which is as follows:

private void awaitSpace() throws IOException {
    while (in == out) {
        checkStateForReceive();

        /* full: kick any waiting readers */
        notifyAll();
        try {
            wait(1000);
        } catch (InterruptedException ex) {
            throw new java.io.InterruptedIOException();
        }
    }
}

The wait(1000) is only interrupted by either hitting the timeout (1000 ms, as seen above), or a call to notifyAll(), which only happens in the following cases:

  • In awaitSpace(), before wait(1000), as we can seein the snippet above
  • In receivedLast(), which is called when the stream is closed (not applicable here)
  • In read(), but only if read() is waiting for an empty buffer to fill up -- also not applicable here

The Question

Does anyone have enough experience with Java to tell me whether this is supposed to be intended behavior? The method awaitSpace() is used by PipedOutputStream.write(...) to wait for free space, and their contract simply states:

This method blocks until all the bytes are written to the output stream.

While this is strictly not violated, 1 second of waiting time seems quite long. If I were to fix this (minimize/lower waiting time), I'd suggest inserting a notifyAll() at the end of each read to make sure that a waiting writer gets notified. To avoid additional time overhead for synchronization, a simple boolean flag could be used (and wouldn't hurt thread safety).

Affected Java Versions

So far, I could verify this on Java 7 and Java 8, the following versions to be precise:

$ java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
like image 770
Michael Borkowski Avatar asked Feb 19 '15 21:02

Michael Borkowski


1 Answers

This is well known issue in Piped*Stream and final resolution (for JDK-8014239) is "Won't fix".

JDK-4545831: PipedInputStream performance problems

The class blocks for read when the buffer is empty and blocks for write then the buffer is full. It blocks by calling wait(1000), however a reader will only be woken by a writer who encounters a full buffer (or the wait times out) and a writer will only be woken by a reader who encounters an empty buffer (or the wait times out).

Customer Workaround : Notify()ing the PipedInputStream after every read()/write() would probably solve the problem, but still results in suboptimal performance as many unnecessary notify() calls are being made.


JDK-4404700: PipedInputStream too slow due to polling (alt implementation proposed)

The java.io.PipedInputStream is too slow because it polls to check for new data. Every second it tests if new data is available. When data is available it potentially wastes almost a second. It also has an unsettable small buffer. I propose to consider the following implementation of both PipedInputStream and PipedOutputStream, which is simpler and much faster.

BT2:EVALUATION

We should keep this around as a target of opportunity for merlin and tiger. Due to the age of the classes the submitted code is designed to replace, there may be compatibility issues involved in using it.


JDK-8014239: PipedInputStream not notifying waiting readers on receive

When reading/writing from PipedInputStream/PipedOutputStream pair, read() blocks exactly for one second when new data is written into PipedOutputStream. The reason for this is that PipedInputStream only wakes waiting readers, when during receive() the buffer is filled. The solution is very simple, add a notifyAll() at the end of both receive() methods in PipedInputStream.

It's not obvious how the majority of real-life scenarios would benefit from the proposed change. Per-write notifications may lead to an unnecessary writer stalls. Thus defeating one of the main purposes of a pipe -- time-decoupling readers from writers and buffering. PipedInputStream/PipedWriter API gives us a flexible way to control how often we would like the reader(s) to be notified on new data. Namely, flush(). Calling flush() at the right time we can control latency and throughput.

like image 81
Nikolay Avatar answered Sep 17 '22 15:09

Nikolay