Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Justification for the design of the public interface of ByteArrayOutputStream?

There are many java standard and 3rd party libraries that in their public API, there are methods for writing to or reading from Stream. One example is javax.imageio.ImageIO.write() that takes OutputStream to write the content of a processed image to it. Another example is iText pdf processing library that takes OutputStream to write the resulting pdf to it. Third example is AmazonS3 Java API, which takes InputStream so that will read it and create file in thir S3 storage.

The problem araises when you want to to combine two of these. For example, I have an image as BufferedImage for which i have to use ImageIO.write to push the result in OutputStream. But there is no direct way to push it to Amazon S3, as S3 requires InputStream.
There are few ways to work this out, but subject of this question is usage of ByteArrayOutputStream.

The idea behind ByteArrayOutputStream is to use an intermidiate byte array wrapped in Input/Output Stream so that the guy that wants to write to output stream will write to the array and the guy that wants to read, will read the array.

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without copying it, for example, to provide an InputStream that has direct access to it. The only way to access it is to call toByteArray(), that will make a copy of the internal array (the standard one). Which means, in my image example, i will have three copies of the image in the memory:

  • First is the actual BufferedImage,
  • second is the internal array of the OutputStream and
  • third is the copy produced by toByteArray() so I can create the InputStream.

How this design is justified?

  • Hiding implementation? Just provide getInputStream(), and the implementation stays hidden.
  • Multi-threading? ByteArrayOutputStream is not suited for access by multiple threads anyway, so this can not be.

Moreover, there is second flavor of ByteArrayOutputStream, provided by Apache's commons-io library (which has a different internal implementation). But both have exactly the same public interface that does not provide way to access the byte array without copying it.

like image 449
Op De Cirkel Avatar asked Dec 28 '22 19:12

Op De Cirkel


1 Answers

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without coping it, for example, to provide an InputStream that has direct access to it.

I can think of four reasons:

  • The current implementation uses a single byte array, but it could also be implemented as a linked list of byte arrays, deferring the creation of the final array until the application asks for it. If the application could see the actual byte buffer, it would have to be a single array.

  • Contrary to your understanding ByteArrayOutputStream is thread safe, and is suitable for use in multi-threaded applications. But if direct access was provided to the byte array, it is difficult to see how that could be synchronized without creating other problems.

  • The API would need to be more complicated because the application also needs to know where the current buffer high water mark is, and whether the byte array is (still) the live byte array. (The ByteArrayOutputStream implementation occasionally needs to reallocate the byte array ... and that will leave the application holding a reference to an array that is no longer the array.)

  • When you expose the byte array, you allow an application to modify the contents of the array, which could be problematic.


How this design is justified?

The design is tailored for simpler use-cases than yours. The Java SE class libraries don't aim to support all possible use-cases. But they don't prevent you (or a 3rd party library) from providing other stream classes for other use-cases.


The bottom line is that the Sun designers decided NOT to expose the byte array for ByteArrayOutputStream, and (IMO) you are unlikely to change their minds.

(And if you want to try, this is not the right place to do it.

  • Try submitting an RFE via the Bugs database.
  • Or develop an patch that adds the functionality and submit it to the OpenJDK team via the relevant channels. You would increase your chances if you included comprehensive unit tests and documentation.)

You might have more success convincing the Apache Commons IO developers of the rightness of your arguments, provided that you can come up with an API design that isn't too dangerous.

Alternatively, there's nothing stopping you from just implementing your own special purpose version that exposes its internal data structures. The code is GPL'ed so you can copy it ... subject to the normal GPL rules about code distribution.

like image 96
Stephen C Avatar answered Dec 31 '22 15:12

Stephen C