Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downsides of fragmented arrays for dynamic byte storage [closed]

Tags:

java

arrays

The default ByteArrayOutputStream seems a rather wasteful implementation and I was wondering if there is any specific reason for this. First off it keeps 1 fixed array in the backend. If that is full, it creates a new array and copies the old array into it (more memory + more overhead). Then if you do toByteArray() it actually copies the array again.

Byte buffers are nice but also fixed in size, they merely offer a few on a single array, nothing more.

I was wondering if it would be interesting to create a class (or if it already exists, please point me towards it) that uses one or more backing arrays. Instead of duplicating the array each time to expand, it just adds a new backing array. To read you can easily create a interface like inputstream while you can expose an interface like outputstream for writing

Any feedback on whether such a thing already exists and if not: why? Does it have some downside I'm not seeing?

like image 356
nablex Avatar asked Oct 04 '13 06:10

nablex


1 Answers

This is actually a great idea, especially for large data.

You can quicky run into memory problems when allocating huge arrays on the heap, as they need contiguous free memory to be allocated. We once had such a situation when we often allocated byte arrays with 10-50MB size, and ran into OutOfMemoryExceptions, not because there was too few memory available (we usually had 90%, or 900MB free), but because due to heap fragmentation there wasn't one single contiguous block of memory that could be used of this array.

We ended up creating a Blob class which internally stored the data as chunks of chained (List) smaller arrays. The arrays had a fixed size (essential for quick lookups, so you can quickly calculate the involved array and offset for a given index), and we create InputStream and OutputStream classes for this Blob. Later we extended it to be swappable to and from the disk.

  • Downside? None, apart from a little simple programming effort.
  • Benefits? Efficient storage of large data in memory, no more problems with heap fragmentation.

I can only encourage you to give it a go!

like image 160
Peter Walser Avatar answered Oct 20 '22 19:10

Peter Walser